In my previous post I showed you how to randomly split up a dataset into training and testing datasets. (Thanks to all those who emailed me or left comments letting me know that this could be done using other means.
In my previous post I showed you how to randomly split up a dataset into training and testing datasets. (Thanks to all those who emailed me or left comments letting me know that this could be done using other means.
Genetic Alliance is a nonprofit health advocacy organization that improves health through the authentic engagement of communities and individuals. This year, they are celebrating their 25th anniversary, and they're hosting a variety of events throughout the year, including monthly salons around the country and the 25th Anniversary Annual Conference in June.
Just saw the announcement of the availability of Rstudio, a new (free & open source) integrated development environment for R that works on Windows, Mac, and Linux. Judging from the screenshots, it looks like Rstudio supports syntax highlighting for Sweave &
I recently analyzed some data trying to find a model that would explain body fat distribution as predicted by several blood biomarkers. I had more predictors than samples (p>n), and I didn't have a clue which variables, interactions, or quadratic terms made biological sense to put into a model.
When I have a question I usually ask the internet before bugging my neighbor.
Had a mental block today trying to figure out how to get the indices of columns in a data frame given their names. Simple task but difficult to search Google for an answer. Thanks to jashapiro, Matt, and Vince for giving me a heads up on the which() function. The which() function returns the indices of TRUE values in a logical vector. If you're looking at the iris data: data(iris) head(iris)
Something like this probably already exists in an R package somewhere out there, but I needed a function to summarize how much missing data I have in each variable of a data frame in R. Pass a data frame to this function and for each variable it'll give you the number of missing values, the total N, and the proportion missing.
A while back I asked you what reference management software you used, and how well you liked it. I received 180 responses, and here's what you said. Out of the choices on the poll, most of you used Mendeley (30%), followed by EndNote (23%) and Zotero (15%). Out of those of you who picked "other," it was mostly Papers or Qiqqa. There were even a few brave souls managing references caveman-style, manually.
Recently I tried compiling Eigensoft on my Ubuntu 10.10 Linux system running in Virtualbox and had no success. From comments on this blog post, it looks like the newer Ubuntu distros don't have the libg2c0 and related libraries (which were a part of the gcc3) and gcc4 uses gfortran instead.
This builds on a previous post from Stephen.I was recently running a series of ANOVA analyses, and I used the aov() function because it had a few options that I preferred. Much like lm(), the function returns an object that you typically pass to summary() to view and interpret the output. It took me a bit of playing to figure out how to extract the information I needed.
After finishing the final revisions on my dissertation I was reminded of this spot-on graphical guide to what a Ph.D. is really all about. Now that I'm finished, I'm leaving Vanderbilt to start a postdoc in genetic epidemiology with Dr. Loic Le Marchand at the University of Hawaii Cancer Center. Posts may be sparse over the next few weeks, but I plan on blogging as usual once I'm set up at my postdoc.