Running velvet

While reading about kmer choice for genome assembly found a new open science hero Titus Brown. Complete, or as close as possible, reproducibility and even puts his un-funded grant proposals online!

velvet-oases

Successfully ran velvet for a single sample (A22-00)!

How to select k-mer sizes for velvet assemblies to merge with oases?

Explained as a balance between coverage and specificity. I have reads to burn (~160 million reads per sample) so no concern about losing coverage. kmers have to be shorter than read length, which after trimming is > 96 bp for 99.99% of reads.

Increased MAXKMERLENGTH for velvet to 92.

Note - recommended to use CD-HIT to filter redundant sequences after oases.

Velvet memory calculator indicatest that transcriptome assembly with all 12 fastq files (about 384 million reads) will require 84Gb RAM so need to move to a bigger system…or could I do the assemly independently for each sample and then merge?

Ray may also work with lower memory, but not extensively tested for transcriptome

Another option mentioned in the oases manual is to join paired-end reads that overlap using FLASH because with an insert length of 200 bp, paired reads may overlap.

Definitely need to perform digital normalization. Website contains tutorial for transcriptome data. And an example of why it’s useful.

Ask for account on mason to run job with large memory.

Abstract for research project on mason:

Species must acclimate, move or adapt to avoid extinction from climate change. Adaptation of species to local conditions has been demonstrated in dozens of studies. In many of these cases, there is also evidence for adaptive plasticity, where the phenotype and fitness of an organism responds to changes in the environment. This phenotypic plasticity may allow species to acclimate and locally persist in face of environmental change. While responses are typically measured at the trait or whole-organism level, plasticity is governed by changes in gene expression in response to environmental stimuli. With new genomic methods, we can identify environmentally responsive genes and pathways in non-model systems. This molecular information will give greater insight on the potential for species to acclimate to novel environmental conditions. For this work, we are working with the dominant eastern woodland ant species in North American, Aphaenogaster picea and rudis. We are generating a transcriptome for these species, and will use this to examine the reaction norm across a range of temperatures for all genes. These data will reveal the genes that underly plasticity to temperature.

Computing

Used perl script to rename files in directory. Slick

#!/usr/bin/perl -w
# rename.pl
#
#  Discussion:
#
#    rename.pl 'tr/A-Z/a-z/' *
#
#    will rename every file so that its name is lowercase.
#
#    rename.pl 's/\.orig$//' *.orig
#
#    will, for every file that ends in ".orig", rename the
#    file by removing the trailing ".orig".
#
#    rename.pl '$_ .= ".old"' *.cc
#
#    will append ".old" to the name of every file that ends in ".cc".
#
# 
#  Reference:
#
#    Tom Christiansen and Nathan Torkington,
#    The Perl Coookbook,
#    Chapter 9.9, Renaming Files
#    O'Reilly, 1999
#
$op = shift or die "Usage: rename expr [files]\n";
chomp(@ARGV = <STDIN>) unless @ARGV;
for ( @ARGV ) 
        {
                  $was = $_;
                    eval $op;
                      die $@ if $@;
                        rename ( $was, $_ ) unless $was eq $_;
                }

First ran

perl rename.pl 's/\_val_[12].fq_fastqc$//' *_val_[12].fq_fastqc

to remove ending. Then ran

perl rename.pl '$_ .= "-fastqc"' *R[12]

to add “-fastqc” to end of all files ending in either R1 or R2. Slick.

Downloaded spyder for python coding…hopefully will help with the learning curve.