Biological SciencesBloggerArchived

Getting Genetics Done

Getting Things Done in Genetics & Bioinformatics Research
Home Page
language
Published
Author Stephen Turner

I talked a little bit about tidy data my recent post about dplyr, but you should really go check out Hadley’s paper on the subject. R expects inputs to data analysis procedures to be in a tidy format, but the model output objects that you get back aren’t always tidy. The reshape2, tidyr, and dplyr are meant to take data frames, munge them around, and return a data frame.

Published
Author Stephen Turner

Two of the most common questions at the beginning of an RNA-seq experiments are "how many reads do I need?" and "how many replicates do I need?". This paper describes a web application for designing RNA-seq applications that calculates an appropriate sample size and read depth to satisfy user-defined criteria such as cost, maximum number of reads or replicates attainable, etc.

Published
Author Stephen Turner

I haven't posted much here recently, but here is a roundup of a few of the links I've shared on Twitter (@genetics_blog) over the last two weeks. Here is a nice tutorial on accessing high-throughput public data (from NCBI) using R and Bioconductor. Cloudnumbers.com , a startup that allows you to run high-performance computing (HPC) applications in the cloud, now supports the previously mentioned R IDE, RStudio.

Published
Author Stephen Turner

I thought it would be trivial to extract the p-value on the F-test of a linear regression model (testing the null hypothesis R²=0). If I fit the linear model: fit<-lm(y~x1+x2), I can't seem to find it in names(fit) or summary(fit). But summary(fit)$fstatistic does give you the F statistic, and both degrees of freedom, so I wrote this function to quickly pull out the p-value from this F-test on a lm object, and added it to my R profile.