Rogue Scholar

BioinformaticsLinuxRBiological Sciences

find | xargs ... Like a Boss

Published March 9, 2012

*Edit March 12* Be sure to look at the comments, especially the commentary on Hacker News - you can supercharge the find|xargs idea by using find|parallel instead. --- Do you ever discover a trick to do something better, faster, or easier, and wish you could reclaim all the wasted time and effort before your discovery?

BioinformaticsPathwaysRVisualizationBiological Sciences

Pathway Analysis for High-Throughput Genomics Studies

https://doi.org/10.59350/ys4fp-ke489

Published March 6, 2012

Author Stephen Turner

I get a lot of requests in the core about running a "pathway analysis." Someone ran a handful of gene expression arrays, or better yet, ran an RNA-seq experiment (with replicates!). These, and many other kinds of high-throughput assays (GWAS, ChIP-seq, etc.) result in a list of genes and some associated p-value, fold change, or other statistic. Here's some R code to download public data from a study on susceptibility to colorectal cancer.

BioinformaticsRBiological Sciences

I'm Hiring!

https://doi.org/10.59350/srst0-9kk55

Published February 24, 2012

Author Stephen Turner

I direct the Bioinformatics Core at the University of Virginia, and I'm hiring. Visit this link on the UVA Jobs website for more information. Here's the description: I'm Hiring - Bioinformatics Analyst in the UVA Bioinformatics CoreGetting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution (CC BY) License.

PubMedBiological Sciences

Your Publications (with PMCID) as a PubMed Query

https://doi.org/10.59350/v47eh-mw411

Published February 17, 2012

Author Stephen Turner

I'm updating my CV and biosketch for a few grant applications, and for some time now, NIH has required you to include the PubMed Central ID for each article you publish that arose from NIH support. I only have a dozen or so papers indexed in PubMed, but I still wanted a way to do this automatically. If you have scores of publications, looking up all the PMCIDs could easily become a hassle. First, create an account at My NCBI.

BioinformaticsBiological Sciences

Webinar: Genomic Networks - Resolving Biomarkers from a Cloud of Data

https://doi.org/10.59350/p016b-42183

Published February 8, 2012

Author Stephen Turner

Kevin White from the University of Chicago will be giving a special guest lecture at NCI next week on systems biology approaches to mine genomics data for biomarkers and therapeutic targets. The lecture will be available online as a videocast.

Ggplot2RVisualizationBiological Sciences

Hadley Wickham: ggplot2 Webinar (Today!)

https://doi.org/10.59350/earkp-qqe36

Published February 8, 2012

Author Stephen Turner

Title: A Backstage Tour of ggplot2 with Hadley Wickham Date: Wednesday, February 8, 2012 Time: 11:00AM - 12:00PM Pacific Presenter: Hadley Wickham, Professor of Statistics, Rice University Register here.

Biological Sciences

Joint Techs Netcast: Enhancing Infrastructure Support for Data Intensive Science

https://doi.org/10.59350/ratdh-ej621

Published January 20, 2012

Author Stephen Turner

The winter Joint Techs meeting is next week in Baton Rouge. I'm not going, but I plan on participating via a netcast to see what's going on. Jim Bottum, Clemson's CIO, is moderating an entire day devoted to the topic Enhancing Infrastructure Support for Data Intensive Science. Of particular interest to me are the talks from 9:30-11am Tuesday January 24 from researchers and those supporting climatology, genomics, and the XSEDE projects.

BioinformaticsRSoftwareBiological Sciences

Annotating limma Results with Gene Names for Affy Microarrays

https://doi.org/10.59350/vw55p-gr892

Published January 17, 2012

Author Stephen Turner

Lately I've been using the limma package often for analyzing microarray data. When I read in Affy CEL files using ReadAffy(), the resulting ExpressionSet won't contain any featureData annotation. Consequentially, when I run topTable to get a list of differentially expressed genes, there's no annotation information other than the Affymetrix probeset IDs or transcript cluster IDs.

ProductivityRTutorialsBiological Sciences

New Year's Resolution: Learn How to Code

https://doi.org/10.59350/mtxn3-1c431

Published January 5, 2012

Author Stephen Turner

Farhad Manjoo at Slate has a good article on why you need to learn how to program. Chances are, if you're reading this post here you're already fairly adept at some form of programming. But if you're not, you should give it some serious thought.

RBiological Sciences

Query a MySQL Database from R using RMySQL

https://doi.org/10.59350/ksgk2-vqc08

Published December 15, 2011

Author Stephen Turner

I use this all the time, and the setup is dead simple. Follow the code below to load the RMySQL package, connect to a database (here the UCSC genome browser's public MySQL instance), set up a function to make querying easier, and query the database to return results as a data frame. Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution (CC BY) License.

AnnouncementsBioinformaticsWritingBiological Sciences

Galaxy Project Group on CiteULike and Mendeley

https://doi.org/10.59350/ahgx4-y5v08

Published December 15, 2011

Author Stephen Turner

The Galaxy Project started using CiteULike to organize papers that are about, use, or reference Galaxy. The Galaxy CiteULike group is open to any CUL user, and once you join, you can add papers to the group, assign tags, and rate papers.