Farhad Manjoo at Slate has a good article on why you need to learn how to program. Chances are, if you're reading this post here you're already fairly adept at some form of programming. But if you're not, you should give it some serious thought.
Farhad Manjoo at Slate has a good article on why you need to learn how to program. Chances are, if you're reading this post here you're already fairly adept at some form of programming. But if you're not, you should give it some serious thought.
James Taylor came to UVA last week and gave an excellent talk on how Galaxy enables transparent and reproducible research in genomics. I'm gearing up to take on several projects that involve next-generation sequencing, and I'm considering installing my own Galaxy framework on a local cluster or on the cloud. If you've used Galaxy in the past you're probably aware that it allows you to share data, workflows, and histories with other users.
I haven't posted much here recently, but here is a roundup of a few of the links I've shared on Twitter (@genetics_blog) over the last two weeks. Here is a nice tutorial on accessing high-throughput public data (from NCBI) using R and Bioconductor. Cloudnumbers.com , a startup that allows you to run high-performance computing (HPC) applications in the cloud, now supports the previously mentioned R IDE, RStudio.
If you missed the tutorial on the 1000 genomes project data last week at ASHG, you can now watch the tutorials on youtube and download the slides online at http://genome.gov/27542240.
There will be a (free) tutorial on the 1000 genomes project at this year's ASHG meeting on Wednesday, November 3, 7:00 – 9:30pm. You can register online at the link below.
Hadley Wickham, creator of ggplot2, an immensely popular framework for Tufte-friendly data visualization using R, is teaching two short courses at Vanderbilt this week. Once we opened registration to Vanderbilt students and staff we instantly filled all the available seats, so unfortunately I wasn't able to announce the course here. But the good news is that Hadley's made all the data, code, and slides from the course available online here.
Found this tutorial by Emily Mankin on how to do principal components analysis (PCA) using R. Has a nice example with R code and several good references. The example starts by doing the PCA manually, then uses R's built in prcomp() function to do the same PCA. Principle Components Analysis: A How-To Manual for R Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution (CC BY) License.
If you attended Frank Harrell's Regression Modeling Strategies course a few weeks ago, you got a chance to see the rms package for R in action. Frank's rms package does regression modeling, testing, estimation, validation, graphics, prediction, and typesetting by storing enhanced model design attributes in the fit.
This looks like a must-read for anyone starting out in computational biology without extensive experience at the command line. The 135-page document linked at the bottom of the Google Group page looks like an excellent primer with lots of examples that could probably be completed in a day or two, and provides a great start for working in a Linux/Unix environment and programming with Perl.
The previously mentioned Regression Modeling Strategies short course taught by Frank Harrell is nearly over. Here are the handouts (PDF) from the course. Keep an eye out here, I'll be writing a few more posts in the near future on topics Frank covered in this course.Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution (CC BY) License.