Another pre-eminent scholar that signs reviews
Some ideas for a ‘motivating example’ for the “Alleles changing” lecture I will be covering in Amanda Yonin’s Human Genetics course on Oct 25th.
sim-transcriptome.sh into a script that can be run with fasta file of known transcripts as input, and output including
Surprising result that velvet-oases assembly with all reads results in fewer contigs of shorter length than assembly with normalized reads. Why would this be???
|Total Trimmed Contigs||201|
|Min contig size||106|
|Median contig size||298|
|Mean contig size||317|
|Max contig size||686|
|Total Trimmed Contigs||322|
|Min contig size||101|
|Median contig size||416|
|Mean contig size||399|
|Max contig size||629|
Parial explanation to (problem before)(/aptranscriptome/2013/10/09/more-simulation-notes.html) where only ~29 of 100 transcripts actually had gene expression counts - low read mapping to assembled transcripts.
Could be because of lots of multiple alignment to the assembled transcripts vs the known transcripts.
For tophat mapped reads to assembled transcripts:
> Mapped: 288471 (19.4% of input) > of these: 183776 (63.7%) have multiple alignments (0 have >20) > Right reads: > Input: 1488635 > Mapped: 276348 (18.6% of input) > of these: 176708 (63.9%) have multiple alignments (0 have >20) > 19.0% overall read alignment rate. > > Aligned pairs: 186284 > of these: 110648 (59.4%) have multiple alignments > and: 2 ( 0.0%) are discordant alignments > 12.5% concordant pair alignment rate.
compared to tophat mapped reads to known transcripts
> Left reads: > Input: 1488635 > Mapped: 978717 (65.7% of input) > of these: 78299 ( 8.0%) have multiple alignments (0 have >20) > Right reads: > Input: 1488635 > Mapped: 973857 (65.4% of input) > of these: 76004 ( 7.8%) have multiple alignments (0 have >20) > 65.6% overall read alignment rate. > > Aligned pairs: 964826 > of these: 62730 ( 6.5%) have multiple alignments > and: 3 ( 0.0%) are discordant alignments > 64.8% concordant pair alignment rate.
But, the correlation of expression counts from cufflinks mapped to known transcripts is beautiful! r=0.83 for 97 of 100 transcripts!
Known vs assembled transcript expression
Similar problem with BWA…what to do with real data where I can’t infer incorrect isoforms?
This work is licensed under a Creative Commons Attribution 4.0 International License.