BiologíaInglésBlogger

Getting Genetics Done

Getting Things Done in Genetics & Bioinformatics Research
Página de inicio
language
1000 GenomesBioinformaticsDatabasesENCODEBiologíaInglés
Publicado

The ENCODE project continues to generate massive numbers of data points on how genes are regulated.  This data will be of incredible use for understanding the role of genetic variation, both for altering low-level cellular phenotypes (like gene expression or splicing), but also for complex disease phenotypes.  While it is all deposited into the UCSC browser, ENCODE data is not always the easiest to access or manipulate.

BioinformaticsPLINKRSoftwareBiologíaInglés
Publicado

I'm a huge supporter of the Free and Open Source Software movement. I've written more about R than anything else on this blog, all the code I post here is free and open-source, and a while back I invited you to steal this blog under a cc-by-sa license. Every now and then, however, something comes along that just might be worth paying for.

BioinformaticsBiologíaInglés
Publicado

A few weeks ago I showed you how to convert gene IDs with BioMart. Yesterday I hosted a workshop on the Ensembl Genome Browser, given by Dr. Bert Overduin from EBI-EMBL. He gave several examples of very useful tasks that you can do very quickly and easily using BioMart. One, in particular, is something that I'm doing for a client in the core right now.

AnnouncementsBioinformaticsRBiologíaInglés
Publicado

If you're doing any kind of big data analysis - genomics, transcriptomics, proteomics, bioinformatics - then unless you've been on vacation the last few weeks you've no doubt heard about the NSF/NIH BIGDATA  Initiative (here's the NSF solicitation and here's the New York Times article about the funding opportunity). The solicitation "aims to advance core scientific and technological means of managing, analyzing, visualizing, and

BioinformaticsBiologíaInglés
Publicado

I was reading through a paper on comparative ChIP-Seq when I found this awk gem that lets you get some very basic stats very quickly on next generation sequencing reads. To use, simply cat the fastq file (or gunzip -c) and pipe that to this awk command: cat myfile.fq | awk '((NR-2)%4==0){read=$1;total++;count[read]++}END{for(read in count){if(!max||count[read]>max)

BioinformaticsRBiologíaInglés
Publicado

I get asked frequently how to convert from one gene identifier to another. This can be tricky, especially when relying on gene symbols, as Will pointed out in a previous post a few years ago. There are several tools that can do this, including DAVID and the previously mentioned new Biomart ID Converter, but I still prefer using the Ensembl Biomart for this because of its added flexibility and annotation.

AnnouncementsBiologíaInglés
Publicado

GGD has a new look. I was inspired by Gina Trapani (Smarterware, Lifehacker) to remove any extra lines, links, and other "ink" that doesn't serve any purpose, and I hope the site appears cleaner and easier to read. I also wanted the extra horizontal space for larger images and avoid the dreaded side-scrolling in posts with lots of code like this one.

BioinformaticsLinuxRBiologíaInglés
Publicado

*Edit March 12* Be sure to look at the comments, especially the commentary on Hacker News - you can supercharge the find|xargs idea by using find|parallel instead. --- Do you ever discover a trick to do something better, faster, or easier, and wish you could reclaim all the wasted time and effort before your discovery?

BioinformaticsPathwaysRVisualizationBiologíaInglés
Publicado

I get a lot of requests in the core about running a "pathway analysis." Someone ran a handful of gene expression arrays, or better yet, ran an RNA-seq experiment (with replicates!). These, and many other kinds of high-throughput assays (GWAS, ChIP-seq, etc.) result in a list of genes and some associated p-value, fold change, or other statistic. Here's some R code to download public data from a study on susceptibility to colorectal cancer.