BiologieAnglaisBlogger

Getting Genetics Done

Getting Things Done in Genetics & Bioinformatics Research
Page d'accueil
language
BioinformaticsGWASRSoftwareBiologieAnglais
Publié
Auteur Stephen Turner

In a previous post I linked to gcol as a quick and intuitive alternative to awk. I just stumbled across yet another set of handy text file manipulation utilities from the creators of the BEAGLE software for GWAS data imputation and analysis.

BioinformaticsRecommended ReadingBiologieAnglais
Publié
Auteur Stephen Turner

After evaluating an unnamed bioinformatics core facility, a group of bioinformaticians in Europe wrote up a short list of basic guidelines for organizing a bioinformatics core facility in large research institutes.

SoftwareBiologieAnglais
Publié
Auteur Stephen Turner

A while back Will showed you how to ditch Excel for awk, a handy Unix command line tool for extracting certain rows and columns from a text file. While I was browsing the documentation on the previously mentioned PLINK/SEQ library, I came across gcol, another utility for extracting columns from a tab-delimited text file. It can't do anything that awk can't, but it's easier and more intuitive to use for simple text munging tasks.

RSQLBiologieAnglais
Publié
Auteur Stephen Turner

Jeffrey Breen put together a useful slideshow on accessing databases from R. I use RODBC every single day to access my own local MySQL server from R. I've had trouble with RMySQL, so I've always used RODBC instead after setting up my localhost MySQL server as a Windows data source. Once you get accustomed to accessing your data directly with SQL queries rather than dumping files you'll wonder why you waited so long.

1000 GenomesBioinformaticsGWASRSequencingBiologieAnglais
Publié
Auteur Stephen Turner

PLINK/SEQ is an open source C/C++ library for analyzing large-scale genome sequencing data. The library can be accessed via the pseq command line tool, or through an R interface. The project is developed independently of PLINK but it's syntax will be familiar to PLINK users. PLINK/SEQ boasts an impressive feature set for a project still in the beta testing phase.

BioinformaticsNoteworthy BlogsSequencingSoftwareBiologieAnglais
Publié
Auteur Stephen Turner

This is a few months old but I just got around to reading this series of blog posts on next-generation sequencing (NGS) by Gabe Rudy, Golden Helix's VP of product development. This series gives a seriously useful overview of NGS technology, then delves into the analysis of NGS data at each step, right down to a description of the most commonly used file formats and tools for the job. Check it out now if you haven't already.

1000 GenomesLinuxPLINKSequencingSoftwareBiologieAnglais
Publié
Auteur Stephen Turner

I recently analyzed some next-generation sequencing data, and I first wanted to compare the frequencies in my samples to those in the 1000 Genomes Project. It turns out this is much easier that I thought, as long as you're a little comfortable with the Linux command line. First, you'll need a Linux system, and two utilities: tabix and vcftools. I'm virtualizing an Ubuntu Linux system in Virtualbox on my Windows 7 machine.

RWritingBiologieAnglais
Publié
Auteur Stephen Turner

I love the idea of using R+LaTeX+Sweave for reproducible research. This is even easier now that R has a jazzy new IDE that supports Sweave syntax highlighting and automatic PDF generation. I know I'm going to take some flak for saying this, but let's be honest here... If you're working in the biomedical sciences, chances are, your collaborators have never heard of Sweave. Physicians only use LaTeX during surgery.

GWASRSequencingBiologieAnglais
Publié
Auteur Stephen Turner

I'm working on a project using next-gen sequencing to fine-map a genetic association in a gene region. Now that I've sequenced the region in a small sample, I'm picking SNPs to genotype in a larger sample. When designing the genotyping assay the lab will need flanking sequence. This is easy to get for SNPs in dbSNP, but what about novel SNPs?