Biological SciencesBloggerArchived

Getting Genetics Done

Getting Things Done in Genetics & Bioinformatics Research
Home Page
language
Published
Author Stephen Turner

I direct the Bioinformatics Core at the University of Virginia, and I'm hiring. Visit this link on the UVA Jobs website for more information. Here's the description: I'm Hiring - Bioinformatics Analyst in the UVA Bioinformatics CoreGetting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution (CC BY) License.

Published
Author Stephen Turner

Lately I've been using the limma package often for analyzing microarray data. When I read in Affy CEL files using ReadAffy(), the resulting ExpressionSet won't contain any featureData annotation. Consequentially, when I run topTable to get a list of differentially expressed genes, there's no annotation information other than the Affymetrix probeset IDs or transcript cluster IDs.

Published
Author Stephen Turner

I use this all the time, and the setup is dead simple. Follow the code below to load the RMySQL package, connect to a database (here the UCSC genome browser's public MySQL instance), set up a function to make querying easier, and query the database to return results as a data frame. Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution (CC BY) License.

Published
Author Stephen Turner

I found the slides below on the education page from Bioinformatics & Research Computing at the Whitehead Institute. The first set (PDF) gives an overview of the methods and software available for quality assessment of microarray and RNA-seq experiments using the FastX toolkit and FastQC. The second set (PDF)  gives an example RNA-seq workflow using TopHat, SAMtools, Python/HTseq, and R/DEseq.

Published
Author Stephen Turner

Gene Expression Omnibus is NCBI's repository for publicly available gene expression data with thousands of datasets having over 600,000 samples with array or sequencing data. You can download data from GEO using FTP, or download and load the data directly into R using the GEOquery bioconductor package written (and well documented) by Sean Davis, and analyze the data using the limma package.

Published
Author Unknown

In general, the standard practice for correcting for population stratification in genetic studies is to use principal components analysis (PCA) to categorize samples along different ethnic axes .  Price et al. published on this in 2006, and since then PCA plots are a common component of many published GWAS studies.

Published
Author Stephen Turner

I just accepted an offer for a faculty position at the University of Virginia in the Center for Public Health Genomics / Department of Public Health Sciences. Starting in October I will be developing and directing a new centralized bioinformatics core in the UVA School of Medicine. Over the next few weeks I'm taking a much-needed vacation next door in Kauai and then packing up for the move to Charlottesville.