Generate simulated data for RNAseq analysis pipeline. Couldn’t get DWGSIM to install correctly so used wgsim from SAMTools
Options allow sequencing error, mutation rate, and variation in read length.
Usage: wgsim [options] <in.ref.fa> <out.read1.fq> <out.read2.fq>
Test by simulating 1,000 2x100PE RNAseq with 500bp size selected fragments, no mutations. created test input data file by copying first ~6 contigs from Pogonomyrmex barbatus predicted transcriptome
./wgsim -d 500 -s 50 -N 1000 -1 100 -2 200 -r 0 -R 0 Pbar_transcriptome_test.fa out.read1.fq out.read2.fq
Worked perfectly! Generated 978 reads. fastq sequence identified specifies the contig that each read came from. number of reads per contig ranged from 85 to 241. note that contigs less 650 bp in length disregarded.
Jai practice talk for EcoLunch
This work is licensed under a Creative Commons Attribution 4.0 International License.