Biological SciencesBlogger

Getting Genetics Done

Getting Things Done in Genetics & Bioinformatics Research
Home Page
language
BioinformaticsGgplot2RBiological Sciences
Published
Author Stephen Turner

The 20th annual ISMB meeting was held over the last week in Long Beach, CA. It was an incredible meeting with lots of interesting and relevant talks, and lots of folks were tweeting the conference, usually with at least a few people in each concurrent session. I wrote the code below that uses the twitteR package to pull all the tweets about the meeting under the #ISMB hashtag. You can download that raw data here.

RVisualizationBiological Sciences
Published
Author Stephen Turner

I saw this plot in the supplement of a recent paper comparing microarray results to RNA-seq results. Nothing earth-shattering in the paper - you've probably seen a similar comparison many times before - but I liked how they solved the overplotting problem using heat-colored contour lines to indicate density. I asked how to reproduce this figure using R on Stack Exchange, and my question was quickly answered by Christophe Lalanne.

BioinformaticsDatabasesDbGaPGWASWeb AppsBiological Sciences
Published

Thanks to the excellent work of Lucia Hindorff and colleagues at NHGRI, the GWAS catalog provides a great reference for the cumulative results of GWAS for various phenotypes.  Anyone familiar with GWAS also likely knows about dbGaP – the NCBI repository for genotype-phenotype relationships – and the wealth of data it contains.

Biological Sciences
Published
Author Stephen Turner

I just read an interesting paper on pathogen discovery using next-generation sequencing data, recommended to me by Nick Loman. A previously described algorithm (PathSeq, Kostic et al) for discovering microbes by deep-sequencing human tissue uses computational subtraction, whereby the initial collection of reads is depleted of human DNA by consecutive alignment to the human reference using MAQ and BLAST.

1000 GenomesBioinformaticsDatabasesENCODEBiological Sciences
Published

The ENCODE project continues to generate massive numbers of data points on how genes are regulated.  This data will be of incredible use for understanding the role of genetic variation, both for altering low-level cellular phenotypes (like gene expression or splicing), but also for complex disease phenotypes.  While it is all deposited into the UCSC browser, ENCODE data is not always the easiest to access or manipulate.

BioinformaticsPLINKRSoftwareBiological Sciences
Published
Author Stephen Turner

I'm a huge supporter of the Free and Open Source Software movement. I've written more about R than anything else on this blog, all the code I post here is free and open-source, and a while back I invited you to steal this blog under a cc-by-sa license. Every now and then, however, something comes along that just might be worth paying for.

BioinformaticsBiological Sciences
Published
Author Stephen Turner

A few weeks ago I showed you how to convert gene IDs with BioMart. Yesterday I hosted a workshop on the Ensembl Genome Browser, given by Dr. Bert Overduin from EBI-EMBL. He gave several examples of very useful tasks that you can do very quickly and easily using BioMart. One, in particular, is something that I'm doing for a client in the core right now.

AnnouncementsBioinformaticsRBiological Sciences
Published
Author Stephen Turner

If you're doing any kind of big data analysis - genomics, transcriptomics, proteomics, bioinformatics - then unless you've been on vacation the last few weeks you've no doubt heard about the NSF/NIH BIGDATA  Initiative (here's the NSF solicitation and here's the New York Times article about the funding opportunity). The solicitation "aims to advance core scientific and technological means of managing, analyzing, visualizing, and

BioinformaticsBiological Sciences
Published
Author Stephen Turner

I was reading through a paper on comparative ChIP-Seq when I found this awk gem that lets you get some very basic stats very quickly on next generation sequencing reads. To use, simply cat the fastq file (or gunzip -c) and pipe that to this awk command: cat myfile.fq | awk '((NR-2)%4==0){read=$1;total++;count[read]++}END{for(read in count){if(!max||count[read]>max) {max=count[read];maxRead=read};if(count[read]==1){unique++}};print