06 November 2013

cap3, uclust, and Genomic-tip

Next post Previous post

Wednesday notes


Following up on yesterday’s conversation:

uclust is the same as CD-HIT but faster because it doesn’t search all transcripts, but stops when probability of a match declines below threshold.

Ran CAP3 on Trinity assembled transcripts from all reads.

cap3 Trinity.fasta -f 50 -a 50 -k 0 -p 90 -o 100 > Trinity_cap3.out

Took about 2 hours to run.

Started with 126,172 transcripts.

15,250 transcripts were clustered into 6,567 contigs, leaving 110,922 singlets and a total of 117,489. Some reduction…but far from the half expected based on read mapping.

Ran uclust on this fasta file to further reduce.

# sort uclust –sort Trinity_cap3.fasta –output Trinity_cap3_sorted.fasta # cluster by 90% similarity threshold uclust –input Trinity_cap3_sorted.fasta –uc Trinity_cap3_uclust.fasta –id 0.90


Last trial yielded zero DNA. Bummer.

Round 3 of Genomic-tip extraction - use overnight incubation with Proteinase K as with first round.

  • 2 sets of 10 ants, 2 per tube
  • 1 set homogenized in 200ul Buffer ATL (used in first round)
  • 1 set homogenized in 200ul Buffer G2 (included in Genomic-tip kit)
  • no RNAse A. High concentration of RNAse relative to DNA may degrade DNA

Finish extraction tomorrow morning.


Very interesting project for teaching stats with R. Worked through the first 5 minutes or so. Nice integration of text, video and figures.


Some great stuff from Andrew Gelman on spurious results


Nosek, B.A., Spies, J.R. & Motyl, M. (2012). Scientific Utopia II. Restructuring Incentives and Practices to Promote Truth Over Publishability. Perspectives on Psychological Science, 7, 615–631.

  • fascinating self-reflection by authors after an exciting initial result
  • great commentary by Andrew Gelman

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 4.0 International License.