Biological SciencesBlogger

Getting Genetics Done

Getting Things Done in Genetics & Bioinformatics Research
Home Page
language
BioinformaticsRSoftwareBiological Sciences
Published
Author Stephen Turner

Lately I've been using the limma package often for analyzing microarray data. When I read in Affy CEL files using ReadAffy(), the resulting ExpressionSet won't contain any featureData annotation. Consequentially, when I run topTable to get a list of differentially expressed genes, there's no annotation information other than the Affymetrix probeset IDs or transcript cluster IDs.

ProductivityRTutorialsBiological Sciences
Published
Author Stephen Turner

Farhad Manjoo at Slate has a good article on why you need to learn how to program. Chances are, if you're reading this post here you're already fairly adept at some form of programming. But if you're not, you should give it some serious thought.

RBiological Sciences
Published
Author Stephen Turner

I use this all the time, and the setup is dead simple. Follow the code below to load the RMySQL package, connect to a database (here the UCSC genome browser's public MySQL instance), set up a function to make querying easier, and query the database to return results as a data frame. Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution (CC BY) License.

AnnouncementsBioinformaticsWritingBiological Sciences
Published
Author Stephen Turner

The Galaxy Project started using CiteULike to organize papers that are about, use, or reference Galaxy. The Galaxy CiteULike group is open to any CUL user, and once you join, you can add papers to the group, assign tags, and rate papers.

BioinformaticsRRNA-SeqSequencingBiological Sciences
Published
Author Stephen Turner

I found the slides below on the education page from Bioinformatics & Research Computing at the Whitehead Institute. The first set (PDF) gives an overview of the methods and software available for quality assessment of microarray and RNA-seq experiments using the FastX toolkit and FastQC. The second set (PDF)  gives an example RNA-seq workflow using TopHat, SAMtools, Python/HTseq, and R/DEseq.

Biological Sciences
Published
Author Stephen Turner

I just got an email from Illumina about a webinar that looks interesting this Wednesday at 9am PST (noon EST) on clinical applications of next-gen sequencing. Date: Wednesday, December 7, 2011Time: 9:00 AM (PST)Speaker: Rick Dewey, MD, Stanford Center for Inherited Cardiovascular Disease Next-generation sequencing (NGS) presents both challenges and opportunities for clinical care.

BioinformaticsBiological Sciences
Published
Author Stephen Turner

BioMart recently got a facelift. I'm not sure if this was always available in the old BioMart, but there's now a link to a gene ID converter that worked pretty well for me for converting S. cerevisiae gene IDs to standard gene names. It looks like the tool will convert nearly any ID you could imagine. Looks like it will also map Affy probe IDs to gene, transcript, or protein IDs and names.

BioinformaticsRBiological Sciences
Published
Author Stephen Turner

Gene Expression Omnibus is NCBI's repository for publicly available gene expression data with thousands of datasets having over 600,000 samples with array or sequencing data. You can download data from GEO using FTP, or download and load the data directly into R using the GEOquery bioconductor package written (and well documented) by Sean Davis, and analyze the data using the limma package.

Recommended ReadingRNA-SeqSequencingTutorialsBiological Sciences
Published
Author Stephen Turner

James Taylor came to UVA last week and gave an excellent talk on how Galaxy enables transparent and reproducible research in genomics. I'm gearing up to take on several projects that involve next-generation sequencing, and I'm considering installing my own Galaxy framework on a local cluster or on the cloud. If you've used Galaxy in the past you're probably aware that it allows you to share data, workflows, and histories with other users.

BioinformaticsClusteringGWASRVisualizationBiological Sciences
Published

In general, the standard practice for correcting for population stratification in genetic studies is to use principal components analysis (PCA) to categorize samples along different ethnic axes .  Price et al. published on this in 2006, and since then PCA plots are a common component of many published GWAS studies.