Biological SciencesBloggerArchived

Getting Genetics Done

Getting Things Done in Genetics & Bioinformatics Research
Home Page
language
Published
Author Stephen Turner

Jeffrey Breen put together a useful slideshow on accessing databases from R. I use RODBC every single day to access my own local MySQL server from R. I've had trouble with RMySQL, so I've always used RODBC instead after setting up my localhost MySQL server as a Windows data source. Once you get accustomed to accessing your data directly with SQL queries rather than dumping files you'll wonder why you waited so long.

Published
Author Stephen Turner

PLINK/SEQ is an open source C/C++ library for analyzing large-scale genome sequencing data. The library can be accessed via the pseq command line tool, or through an R interface. The project is developed independently of PLINK but it's syntax will be familiar to PLINK users. PLINK/SEQ boasts an impressive feature set for a project still in the beta testing phase.

Published
Author Stephen Turner

This is a few months old but I just got around to reading this series of blog posts on next-generation sequencing (NGS) by Gabe Rudy, Golden Helix's VP of product development. This series gives a seriously useful overview of NGS technology, then delves into the analysis of NGS data at each step, right down to a description of the most commonly used file formats and tools for the job. Check it out now if you haven't already.

Published
Author Stephen Turner

I recently analyzed some next-generation sequencing data, and I first wanted to compare the frequencies in my samples to those in the 1000 Genomes Project. It turns out this is much easier that I thought, as long as you're a little comfortable with the Linux command line. First, you'll need a Linux system, and two utilities: tabix and vcftools. I'm virtualizing an Ubuntu Linux system in Virtualbox on my Windows 7 machine.

Published
Author Stephen Turner

I love the idea of using R+LaTeX+Sweave for reproducible research. This is even easier now that R has a jazzy new IDE that supports Sweave syntax highlighting and automatic PDF generation. I know I'm going to take some flak for saying this, but let's be honest here... If you're working in the biomedical sciences, chances are, your collaborators have never heard of Sweave. Physicians only use LaTeX during surgery.

Published
Author Stephen Turner

I'm working on a project using next-gen sequencing to fine-map a genetic association in a gene region. Now that I've sequenced the region in a small sample, I'm picking SNPs to genotype in a larger sample. When designing the genotyping assay the lab will need flanking sequence. This is easy to get for SNPs in dbSNP, but what about novel SNPs?

Published
Author Stephen Turner

Lately I haven't written as many full length posts as usual, but here's a quick roundup of a few links I've shared on Twitter (@genetics_blog) over the last week: First, 23andMe is having a big DNA Day Sale ($108) for the kit + 1 year of their personal genome subscription service https://www.23andme.com/. Previously mentioned R IDE RStudio released a new beta (version 0.93) that includes several new features and bugfixes.