Biological SciencesBloggerArchived

Getting Genetics Done

Getting Things Done in Genetics & Bioinformatics Research
Home Page
language
Published
Author Unknown

One of the most powerful tools you can learn to use in genomics research is a relational database system, such as MySQL.  These systems are fairly easy to setup and use, and provide users the ability to organize and manipulate data and statistical results with simple commands.  As a graduate student (during the height of GWAS), this single skill quickly turned me into an “expert”.

Published
Author Stephen Turner

I'm a huge supporter of the Free and Open Source Software movement. I've written more about R than anything else on this blog, all the code I post here is free and open-source, and a while back I invited you to steal this blog under a cc-by-sa license. Every now and then, however, something comes along that just might be worth paying for.

Published
Author Stephen Turner

I recently analyzed some next-generation sequencing data, and I first wanted to compare the frequencies in my samples to those in the 1000 Genomes Project. It turns out this is much easier that I thought, as long as you're a little comfortable with the Linux command line. First, you'll need a Linux system, and two utilities: tabix and vcftools. I'm virtualizing an Ubuntu Linux system in Virtualbox on my Windows 7 machine.

Published
Author Stephen Turner

Hansong Wang, our biostats professor here at the Hawaii Cancer Center, generously gave me some R code that goes through a SNP annotation file (i.e. a mapfile) and selects SNPs that are at least a certain specified distance apart. You might want to do this if you're picking a subset of SNPs for PCA, for instance.

Published
Author Stephen Turner

Last week Will showed you a bash script version of a sed command covered here a while back that would convert PLINK output from the default variable space-delimited format to a more database-loading-friendly tab or comma delimited file. A commenter asked how to do this on windows, so I'll share the way I do this using a perl script which you can use on windows after installing ActivePerl.

Published
Author Unknown

A while back, Stephen wrote a very nice post about converting PLINK output to a CSV file. If you are like me, you have used this a thousand times -- enough to get tired of typing lots of SED commands. I just crafted a little BASH script that accomplishes the same effect with a single easy to type command. Insert the following text into your .bashrc file.