ddRADseq enzyme selection

To select best enzyme combination for double digest RADseq (ddRAD), we need to estimate the number of fragments generated by digestion using combination of restriction enzymes. As an Aphaenogaster genome isn’t available, we are using the Pogonomyrmex barbatus as the closest reference. Note that Table 1 of the ddRADseq paper also reports simulated fragment recovery for Solenopsis invicta, Apis mellifera and Drosophila melanogaster.

Method: in silico digest using code suggested on seqanswers:

cat pbar_scaffolds_v03.fasta | tr -d "\n" | grep -o -E "CCGG.{264,336}CCGG" | wc -l

where pbar_scaffolds_v03.fasta is the target genome fasta file, and CCGG is replaced the forward and reverse restriction enzyme recognition sequence, respectively. The fragment size selected {264,336} is based on the ‘wide’ size selection simulation of Peterson et al. (2012).

Restriction Enzyme	Recognition sequence	Adapter
NlaIII	5’CATG	P1-flex
Sphl	5’GCATGC	P1-flex
MluCI	5’AATT	P2-flex
EcoRI	5’GAATTC	P2-flex
MspI	5’CCGG	specific
SbfI	5’CCTGCAGG	specific

Number of fragments in P. barbatus and S. invicta for different double digest combinations with a window size of 300+-36bp. Combinations compatible with flex adapters are italicized.

Forward enzyme	Reverse enzyme	P. barbatus	S. invicta
SbfI	EcoRI	23	20
Sphl	EcoRI	784	913
EcoRI	MspI	7,866	9,114
NlaIII	EcoRI	12,032	14,354
Sphl	MluCI	18,193	23,738
NlaIII	MluCI	210,506	285,540

Odd note - compared to fragment estimates from in Table 1 of Peterson el al. (2012), my estimate for Sphl - MluCI is 10 times greater, and my estimate for NlaIII - MluCI is 20 times greater. ~~Unclear where the difference arises it is comparable for the other combinations, and I can’t find Methods to explain their numbers.~~

EDIT: SCH pointed out that AATT is often repeated. Checked methods of paper and they specify that repeats are masked for simulations with the mouse genome. Likely did the same with the other genomes. Thus, the discrepancy in number is likely due to repeats.

EDIT: See … post for results of empirical digestion.