Gene expression counts
With BWA, have actual read counts, but need normalized counts - Reads per Kilobase per Million mapped reads (RPKM) typically reported but recently shown that this is problematic for comparison among samples so Transcripts per Million (TPM) is endorsed (Wagner et al. 2012). Bizarrely, couldn’t find any scripts to do this from BWA and the original paper not clear on how to do so. Fortunately found great blog post with a nice worked example that I was able to recreate.
Made my example available as a gist
Useful presentation on TPM here
Success!!! Working with TPM instead of raw gene counts results in a strong correlation (r = 0.90) between known expression and TPM from mapping reads to Oases transcriptome. Resolves issue from first round of gene expression simulation.
Now that I know it works and have a functioning pipeline, proceed to full analysis!
When I map reads to the known transcripts, results are as such
where the first column is the gene, second and third columns are start and stop locations where reads were aligned to, and fourth column is the number of reads mapping to that transcript.
However, when I map reads to Oases assembled transcripts, I get NO mapped reads
Odd. Maybe because of multiple isoforms in Oases assembly..though I thought it tried to take them out by ‘popping bubbles’
This work is licensed under a Creative Commons Attribution 4.0 International License.