Biological SciencesBloggerArchived

Getting Genetics Done

Getting Things Done in Genetics & Bioinformatics Research
Home Page
language
Published
Author Unknown

One of the most powerful tools you can learn to use in genomics research is a relational database system, such as MySQL.  These systems are fairly easy to setup and use, and provide users the ability to organize and manipulate data and statistical results with simple commands.  As a graduate student (during the height of GWAS), this single skill quickly turned me into an “expert”.

Published
Author Unknown

ENSEMBL is a frequently used resource for various genomics and transcriptomics tasks.  The ENSEMBL website and MART tools provide easy access to their rich database, but ENSEMBL also provides flat-file downloads of their entire database and a public MySQL portal.  You can access this using the MySQL Workbench using the following: Once inside, you can get a sense for what the ENSEMBL schema (or data model) is like.

Published
Author Stephen Turner

Jeffrey Breen put together a useful slideshow on accessing databases from R. I use RODBC every single day to access my own local MySQL server from R. I've had trouble with RMySQL, so I've always used RODBC instead after setting up my localhost MySQL server as a Windows data source. Once you get accustomed to accessing your data directly with SQL queries rather than dumping files you'll wonder why you waited so long.

Published
Author Stephen Turner

I've covered a few topics in the past including the plyr package, which is kind of like "GROUP BY" for R, and the merge function for merging datasets. I only recently found the sqldf package for R, and it's already one of the most useful packages I've ever installed. The main function in the package is sqldf(), which takes a quoted string as an argument. You can treat data frames as tables as if they were in a relational database.

Published
Author Stephen Turner

Check out this paper in PNAS and the corresponding synopsis in the New York Times. The authors take a unique approach to finding genes likely to be associated with human traits using orthologous phenotypes in model organisms, or phenologs. The idea is simple. The authors have a database of ~2000 disease associated genes in humans.

Published
Author Unknown

This is the first in a series of posts on how to use MySQL with genetic data analysis. MySQL is a very popular, freely available database management system that is easily installed on desktop computers (or Linux servers). The "SQL" in MySQL stands for Structured Query Language, which is by my humble estimation the most standardized way to store and access information in the known universe.