RNA-seq has been performed routinely for at least 5+ years, yet there is no consensus on the best methodology for analyzing this data.
RNA-seq has been performed routinely for at least 5+ years, yet there is no consensus on the best methodology for analyzing this data.
One of the clearest advantages RNA-seq has over array-based technology for studying gene expression is not needing a reference genome or a pre-existing oligo array. De novo transcriptome assembly allows you to study non-model organisms, cancer cells, or environmental metatranscriptomes.
Torsten Seemann compiled a list of minimum standards for bioinformatics command line tools, things like printing help when no commands are specified, including version info, avoid hardcoded paths, etc. These should be obvious to any seasoned software engineer, but many of these standards are not followed in bioinformatics.
ENSEMBL is a frequently used resource for various genomics and transcriptomics tasks. The ENSEMBL website and MART tools provide easy access to their rich database, but ENSEMBL also provides flat-file downloads of their entire database and a public MySQL portal. You can access this using the MySQL Workbench using the following: Once inside, you can get a sense for what the ENSEMBL schema (or data model) is like.
As the 2013 ISMB/ECCB meeting is winding down, I archived and analyzed the 2000+ tweets from the meeting using a set of bash and R scripts I previously blogged about. The archive of all the tweets tagged #ISMBECCB from July 19-24, 2013 is and will forever remain here on Github. You'll find some R code to parse through this text and run the analyses below in the same repository, explained in more detail in my previous blog post.
At last week's 2013 useR! conference in Albacete, Spain, Martin Morgan and Marc Carlson led a course on using R/Bioconductor for analyzing next-gen sequencing data, covering alignment, RNA-seq, ChIP-seq, and sequence annotation using R. The course materials are online here, including R code for running the examples, the PDF vignette tutorial, and the course material itself as a package. Course Materials from useR!
I collaborate with several investigators on gene expression projects using both microarray and RNA-seq. After I show a collaborator which genes are dysregulated in a particular condition or tissue, the most common question I get is " what are the transcription factors regulating these genes? " This isn't the easiest question to answer.
Since the near beginning of genome-wide association studies, the PLINK software package (developed by Shaun Purcell’s group at the Broad Institute and MGH) has been the standard for manipulating the large-scale data produced by these studies.
Automatically Archiving Twitter Results Ever since Twitter gamed its own API and killed off great services like IFTTT triggers, I've been looking for a way to automatically archive tweets containing certain search terms of interest to me. Twitter's built-in search is limited, and I wanted to archive interesting tweets for future reference and to start playing around with some basic text / trend analysis.
A handful of good metagenomics papers have come out over the last few months. Below I've linked to and copied my evaluation of each of these articles from F1000. ... 1. Willner, Dana, and Philip Hugenholtz. "From deep sequencing to viral tagging: Recent advances in viral metagenomics." BioEssays (2013). My evaluation: This review lays out some of the challenges and recent advances in viral metagenomic sequencing.