05 September 2014

VACC, Docker, STRUCTURE, and haplotype clustering

Next post Previous post

# 2 - 5 September notes

## 2014-09-02

• AN practice talk
• ApTranscriptome - working on re-running analysis using VACC while waiting for workstation to be fixed

## 2014-09-03

• EEEB lunch

### MtGIRAFFE

• Met with Jeanne Harris to discuss progress
• Have phased haplotypes from BEAGLE
• Working on formating data for STRUCTURE
• WAIT: can I just use raw haplotypes from BEAGLE? How many are there?

Got data in STRUCTURE format!

To get STRUCTURE to run, had to set POPFLAGF, USEPOPINFO and LOCISPOP == 0

Trained Laurel on ant room.

### Computing

Workstation still in shop. Major bottleneck in proceeding with analysis is the (re-)installation of software. Motivation to look into using Docker to build reproducible computing image so not only the analysis, but also the computational set-up is reproducible.

Some examples here and here

Not so easy - Docker only supported for 64-bit architecture. Steps for getting docker to work on 32-bit architecture.

Random awesomeness: iPipet - benchtop tool to track the transfer of samples and reagents using a tablet

## 2014-09-04

• Aphaenophone
• genomics update
• plans for chambers in 2015
• grant renewal
• AN data: CTmax measure?
• Aphaenofest next Jan or Feb in

### Computing

Beginning to move analyses to VACC. Cluster runs RedHat Enterprise Linux 5 and all software is completely out of date. Had to install new versions of emacs, java, R.

As test case, doing MtGIRAFFE bioinformatics. Installed bcftools, beagle, etc.

It is 64-bit architecture so hope to get Docker running, but only supported for RHEL 6> so may not work.

Also trying to work on Amazon Cloud.

Considering using BioBrew for computational setup.

### MtGIRAFFE

Going on VACC - got necessary programs installed.

After haplotype inference with Beagle, found 104 unique haplotypes out of the 524 haplotypes. Given the nature of the inbred accessions, I was surprised that there were so many accessions with two different haplotypes - maybe imputation isn’t appropriate? I also check the number of unique haplotypes in only the first or second chromosome from each accession, with results of 94 and 96 unique haplotypes. So, either way there are nearly 100 haplotypes in the 262 accessions. Best method to further reduce/cluster these?

As a first attempt, using Instruct.

## 2014-09-05

### Computing

Trying to install pandoc for rmarkdown on VACC. Have to install locally as I don’t have root access so can’t use yum.

• Tried installing pandoc-standalone from here but can’t use yum
• Followed these steps to convert rpm to cpio archive

rpm2cpio pandoc-1.12.4.2-1.x86_64.rpm | cpio -idv

and this seemed to work as a new directory tree was created /usr/bin/pandoc, but trying to run pandoc gave this error.

./pandoc: error while loading shared libraries: libffi.so.5: cannot open shared object file: No such file or directory

At this point…I quit.

### MtGIRAFFE

InStruct ran successfully! In test of 1 - 10 clusters, 10 was the optimal number. That said…I’m starting to think this is the wrong approach. Structure-like programs are for clustering of individuals using unlinked molecular markers. These SNPs are all from the same gene so are highly linked.

While BEAGLE can be used for clustering, the clusters are an intermediate step for association mapping and not easily viewed. Google-fu found the Haplosuite R code that (maybe) gets around these (Teo & Small 2010).