05 June 2014
DAVID, Bioconductor, and semantic similarityNext post Previous post
From my work over past few days, have discovered two major issues with using DAVID through the RDAVIDWebService R/Bioconductor package.
Regarding (1), I emailed the developer and he quickly responded that he would look into it. It’s straight-forward to manually calculate the correct Enrichment score, but a bit odd that this problem occurs nonetheless. Also, I can’t replicate the exact results running the same files on the DAVID web interface, though many of the same clusters do appear.
Issue (2) of the database is a larger problem. Annecdotal discussion on biostars [suggests][https://www.biostars.org/p/9394/#9401] this is a problem.
At this point - my thoughts are to give up on DAVID.
To get the main benefits of the DAVID analysis, clustering of related terms, I continued with clustering related GO terms using GOSemSim R/Bioconductor package as yesterday
Specifcally, I calculated the semantic similarity among all GO terms for an enriched category (e.g. High genes), calculated the distance among terms, and then plotted.
I was then able to assign a functional name to each cluster of related-GO terms. For example, here’s the clustering of A. picea High genes:
and here’s the clustering for A. picea Low genes:
This work is licensed under a Creative Commons Attribution 4.0 International License.