When I have a question I usually ask the internet before bugging my neighbor.
When I have a question I usually ask the internet before bugging my neighbor.
Had a mental block today trying to figure out how to get the indices of columns in a data frame given their names. Simple task but difficult to search Google for an answer. Thanks to jashapiro, Matt, and Vince for giving me a heads up on the which() function. The which() function returns the indices of TRUE values in a logical vector. If you're looking at the iris data: data(iris) head(iris)
Something like this probably already exists in an R package somewhere out there, but I needed a function to summarize how much missing data I have in each variable of a data frame in R. Pass a data frame to this function and for each variable it'll give you the number of missing values, the total N, and the proportion missing.
A while back I asked you what reference management software you used, and how well you liked it. I received 180 responses, and here's what you said. Out of the choices on the poll, most of you used Mendeley (30%), followed by EndNote (23%) and Zotero (15%). Out of those of you who picked "other," it was mostly Papers or Qiqqa. There were even a few brave souls managing references caveman-style, manually.
Recently I tried compiling Eigensoft on my Ubuntu 10.10 Linux system running in Virtualbox and had no success. From comments on this blog post, it looks like the newer Ubuntu distros don't have the libg2c0 and related libraries (which were a part of the gcc3) and gcc4 uses gfortran instead.
This builds on a previous post from Stephen. I was recently running a series of ANOVA analyses, and I used the aov() function because it had a few options that I preferred. Much like lm(), the function returns an object that you typically pass to summary() to view and interpret the output. It took me a bit of playing to figure out how to extract the information I needed.
After finishing the final revisions on my dissertation I was reminded of this spot-on graphical guide to what a Ph.D. is really all about. Now that I'm finished, I'm leaving Vanderbilt to start a postdoc in genetic epidemiology with Dr. Loic Le Marchand at the University of Hawaii Cancer Center. Posts may be sparse over the next few weeks, but I plan on blogging as usual once I'm set up at my postdoc.
I thought it would be trivial to extract the p-value on the F-test of a linear regression model (testing the null hypothesis R²=0). If I fit the linear model: fit<-lm(y~x1+x2), I can't seem to find it in names(fit) or summary(fit). But summary(fit)$fstatistic does give you the F statistic, and both degrees of freedom, so I wrote this function to quickly pull out the p-value from this F-test on a lm object, and added it to my R profile.
Coming from the lineage of Jason Moore, I am obliged to occasionally remind everyone that biological systems are inherently complex, and to some degree, we should therefore expect statistical models involving those systems to be complex as well. With the development of GWAS, many approaches to examine epistasis are weighed down by the computational burden of exhaustively conducting billions of statistical tests.
When I started grad school I started using Reference Manager (RefMan), similar to EndNote, to manage my references and bibliographies. It's a real pain, and I often feel like I'm powering my computer with the endless pumping and clicking of the mouse that it takes to import a reference into my library. Recently I've started using Zotero because of how easy it is to import references, store PDFs, and sync between computers.
About a year ago I wrote a post about Dropbox - a free, awesome, cross-platform utility that syncs files across multiple computers and securely backs up your files online. Dropbox is indispensable in my own workflow. I store all my R code, perl scripts, and working manuscripts in my Dropbox. You can also share folders on your computer with other Dropbox users, which makes coauthoring a paper and sharing manuscript files a trivial task.