Biological SciencesBloggerArchived

Getting Genetics Done

Getting Things Done in Genetics & Bioinformatics Research
Home Page
language
Published
Author Stephen Turner

Coursera's free Computing for Data Analysis course starts today. It's a four week long course, requiring about 3-5 hours/week. A bit about the course: There are also hundreds of other free courses scheduled for this year. While the Computing for Data Analysis course is more about using R, the Data Analysis course is more about the methods and experimental designs you'll use, with a smaller emphasis on the R language.

Published
Author Stephen Turner

There's a new kid on the block for RNA-seq alignment. Dobin, Alexander, et al. "STAR: ultrafast universal RNA-seq aligner." Bioinformatics (2012). Aligning RNA-seq data is challenging because reads can overlap splice junctions. Many other RNA-seq alignment algorithms (e.g. Tophat) are built on top of DNA sequence aligners.

Published
Author Stephen Turner

I'm a huge supporter of the Free and Open Source Software movement. I've written more about R than anything else on this blog, all the code I post here is free and open-source, and a while back I invited you to steal this blog under a cc-by-sa license. Every now and then, however, something comes along that just might be worth paying for.

Published
Author Stephen Turner

Lately I've been using the limma package often for analyzing microarray data. When I read in Affy CEL files using ReadAffy(), the resulting ExpressionSet won't contain any featureData annotation. Consequentially, when I run topTable to get a list of differentially expressed genes, there's no annotation information other than the Affymetrix probeset IDs or transcript cluster IDs.

Published
Author Unknown

Peter Visscher and colleagues have recently published a flurry of papers employing a new software package called GCTA to estimate the heritability of traits using GWAS data (GCTA stands for Genome-wide Complex Trait Analysis -- clever acronymity!). The tool, supported (and presumably coded) by Jian Yang is remarkably easy to use, based in part on the familiar PLINK commandline interface.

Published
Author Stephen Turner

A while back Will showed you how to ditch Excel for awk, a handy Unix command line tool for extracting certain rows and columns from a text file. While I was browsing the documentation on the previously mentioned PLINK/SEQ library, I came across gcol, another utility for extracting columns from a tab-delimited text file. It can't do anything that awk can't, but it's easier and more intuitive to use for simple text munging tasks.

Published
Author Stephen Turner

This is a few months old but I just got around to reading this series of blog posts on next-generation sequencing (NGS) by Gabe Rudy, Golden Helix's VP of product development. This series gives a seriously useful overview of NGS technology, then delves into the analysis of NGS data at each step, right down to a description of the most commonly used file formats and tools for the job. Check it out now if you haven't already.

Published
Author Stephen Turner

I recently analyzed some next-generation sequencing data, and I first wanted to compare the frequencies in my samples to those in the 1000 Genomes Project. It turns out this is much easier that I thought, as long as you're a little comfortable with the Linux command line. First, you'll need a Linux system, and two utilities: tabix and vcftools. I'm virtualizing an Ubuntu Linux system in Virtualbox on my Windows 7 machine.

Published
Author Stephen Turner

Lately I haven't written as many full length posts as usual, but here's a quick roundup of a few links I've shared on Twitter (@genetics_blog) over the last week: First, 23andMe is having a big DNA Day Sale ($108) for the kit + 1 year of their personal genome subscription service https://www.23andme.com/. Previously mentioned R IDE RStudio released a new beta (version 0.93) that includes several new features and bugfixes.