04 December 2013


ddRADseq and restriction enzymes

Next post Previous post

ddRADseq enzyme selection

To select best enzyme combination for double digest RADseq (ddRAD), we need to estimate the number of fragments generated by digestion using combination of restriction enzymes. As an Aphaenogaster genome isn’t available, we are using the Pogonomyrmex barbatus as the closest reference. Note that Table 1 of the ddRADseq paper also reports simulated fragment recovery for Solenopsis invicta, Apis mellifera and Drosophila melanogaster.

Method: in silico digest using code suggested on seqanswers:

cat pbar_scaffolds_v03.fasta | tr -d "\n" | grep -o -E "CCGG.{264,336}CCGG" | wc -l

where pbar_scaffolds_v03.fasta is the target genome fasta file, and CCGG is replaced the forward and reverse restriction enzyme recognition sequence, respectively. The fragment size selected {264,336} is based on the ‘wide’ size selection simulation of Peterson et al. (2012).

Restriction Enzyme Recognition sequence Adapter
NlaIII 5’CATG P1-flex
Sphl 5’GCATGC P1-flex
MluCI 5’AATT P2-flex
EcoRI 5’GAATTC P2-flex
MspI 5’CCGG specific
SbfI 5’CCTGCAGG specific

Number of fragments in P. barbatus and S. invicta for different double digest combinations with a window size of 300+-36bp. Combinations compatible with flex adapters are italicized.

Forward enzyme Reverse enzyme P. barbatus S. invicta
SbfI EcoRI 23 20
Sphl EcoRI 784 913
EcoRI MspI 7,866 9,114
NlaIII EcoRI 12,032 14,354
Sphl MluCI 18,193 23,738
NlaIII MluCI 210,506 285,540

Odd note - compared to fragment estimates from in Table 1 of Peterson el al. (2012), my estimate for Sphl - MluCI is 10 times greater, and my estimate for NlaIII - MluCI is 20 times greater. Unclear where the difference arises it is comparable for the other combinations, and I can’t find Methods to explain their numbers.

EDIT: SCH pointed out that AATT is often repeated. Checked methods of paper and they specify that repeats are masked for simulations with the mouse genome. Likely did the same with the other genomes. Thus, the discrepancy in number is likely due to repeats.

EDIT: See … post for results of empirical digestion.


Creative Commons Licence
This work is licensed under a Creative Commons Attribution 4.0 International License.