30 May 2013

field work, seqtk, Aphaenogaster, Pogonomyrmex, and transcriptomics

Next post Previous post

week notes


Seems that the world of (open-source) webapps is never-ending. Just discovered dillinger. Browser-based app that allows side-by-side editing of markdown file and rendered output. While using it now, don’t see it as a long-term solution because CONSIDERABLY slows typing (what’s the point of a flat text file if not fast?) and once you’ve learned the basic markdown syntax, you don’t need to see the output simulatenously.

Also found some information on slidify which may be even better than slides (for me) because it’s based on R code. Though…I’m not sure I want to be creating plots while making a presentation. Seems better to have an .Rmd tech report, then load the images into presentation using slide, slideous, etc.

Comments from NG on P-value commentary.

Finished Functional Ecology review.

Phone call with PT to discuss ‘Estimating heritability using genomic data’ manuscript


Working on transcriptome pipeline with Pogo data

based on fastq files, single reads in Illumina 1.3 format so need to convert

install biopython on antlab

first had to install numpy and scipy using pip

back to some basics for running a job on a server

use nohup to run job without hangups, end line with & to run in background

nohup nice -n 19 bash 01-fastq-sanger.sh

shell script that drives python script


Installed seqtk on antlab as on 2013-05-14.

Damn file format issues.fastx-toolkit assumes Illumina 1.3+ FASTQ (ASCI offset of 64) but thankfully on seqanswers it is explained that there is an undocumented argument -Q that determines the input quality ASCII. This can be changed to -Q 33 to get standard sanger FASTQ.

Some information about trimming Illumina adaptor sequences

Overwhelming list of bioinformatics papers to read on same website

P-value commentary

Wow - email blowing up with responses from ecolog to participate in stats survey, and re-post of Early Career Ecologists blog post by Jeremy Fox at Dynamic Ecology!


Ant collecting at Molly Bog with AN, MH, Keri Pinder

Found 12 colonies in about 4 hours of searching (including bush-whacking through ticket). Only 3 with queens. Easier to find queens from colonies in logs than in leaf debris so target these.


Meeting with Josh Brown

Meeting with VGN: Jim Vincent, Heather Driscoll, Mahesh

choice of de novo assembler

  • Trinity: very slow
  • Velvet-Oases: high memory. good coverage, but many mistakes/chimeras due to lenient mapping
  • EarB(??) de novo: new program, compares well, FAST


  • would have been nice to have in library RNA spike in to see if can recreate known RNA transcripts at end of whole process. unclear to me that these would have added much information beyond a validation of the library prep and biases introduced by this process, which should be known from other publications (Jiang 2011)
  • go ahead with in silico RNA spike in before bioinformatics pipeline

pool all files to generate transcriptomes

  1. map reads to contigs for gene expression counts
  2. annotation of contigs usig BLAST2GO

gene expression

  • DeSeq2 may allow for multiple samples
  • regression/randomization approach

population genetic analyses

  • need codon positions for
  • GeneMark-E
  • BlastX does not need to know frame positions, looks for short open reading frames and does not worry about errors
  • GeneMark trys to overcome by finding open reading frames and correcting for errors


  • 100-500bp tags upstream from polyA tail
  • map to transcriptome
  • ASSUMES well-annotated transcriptome, but this may not be true at 3’ end
  • transcriptome may be poor quality…next up would be genome sequencing


Jiang, L., F. Schlesinger, C. A. Davis, Y. Zhang, R. Li, M. Salit, T. R. Gingeras, and B. Oliver. 2011. Synthetic spike-in standards for RNA-seq experiments. Genome Research 21:1543–1551.


Redmine for project management. Possible to integrate with github


NuGEN Encore Rapid Kits arrived.

Transferring transcriptome data to VGN.

While disappointed not to find complete data for a recent paper published in Nature (actually not required for field data), I was excited to see their new reporting requirements which state that ‘Sample size’, ‘Randomizaton’ and ‘Replication’ need to be explicitly reported, as well as clear guidelines for statistics!

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 4.0 International License.