As a complement to topGO
approach, use DAVID to focus on genes with high-quality annotations in model organisms Apis and Drosophila.
Get these BLAST hits using method for annotation I tried before going to FastAnnotator.
txid7215[ORGN] OR txid7459[ORGN]
Used list of GIs from the previous step with the blastdb_aliastool
to build an aliased blastdb of just insects:
blastdb_aliastool -gilist apis_drosophila.gi.txt -db nr -out nr_apis_drosophila -title “Apis and Drosophila nr database”
which gave output:
Converted 585719 GIs from apis_drosophila.gi.txt to binary format in nr_apis_drosophila.p.gil Created protein nr_apis_drosophila BLAST (alias) database with 274099 sequences (out of 585719 in nr_apis_drosophila.p.gil, 47% found)
Tested search against new (aliased) database:
blastx -query query.fa -db nr_insecta -out querytest
Worked!
Now…for the blastx
of the whole caboodle, given problems I had before, trialed and used GNU parallel
as described on Biostars to efficiently use all cores.
Test script for 50 sequences
cat ../test.fa | parallel --block 100k --recstart '>' --pipe blastx -evalue 0.01 -outfmt 6 -db /home/data/databases/ncbi_db/nr_apis_drosophila -query - > result
Moved analysis to Mason cluster - already had NCBI nr database downloaded there. Started script running…
Meet with Grace to go over results.
Ideas for continuation of Grace’s project:
This work is licensed under a Creative Commons Attribution 4.0 International License.