Rogue Scholar

Biological Sciences

Its like Photoshop for Graphs

Published March 15, 2011

Thanks to links from Sarah Pendergrass, I stumbled upon this awesome program for graph visualization and analysis called Gephi. It seems rather feature rich, with built in connectors to database systems, extensive graph coloring, layout, and rendering features, and several analysis tools.

Ggplot2RVisualizationBiological Sciences

Forest plots using R and ggplot2

https://doi.org/10.59350/r2d4c-f1h89

Published March 9, 2011

Author Stephen Turner

Abhijit over at Stat Bandit posted some nice code for making forest plots using ggplot2 in R. You see these lots of times in meta-analyses, or as seen in the BioVU demonstration paper. The idea is simple - on the x-axis you have the odds ratio (or whatever stat you want to show), and each line is a different study, gene, SNP, phenotype, etc.

Machine LearningRStatisticsBiological Sciences

Splitting a Dataset Revisited: Keeping Covariates Balanced Between Splits

https://doi.org/10.59350/c11ry-cnp40

Published March 8, 2011

Author Stephen Turner

In my previous post I showed you how to randomly split up a dataset into training and testing datasets. (Thanks to all those who emailed me or left comments letting me know that this could be done using other means.

AnnouncementsPolicyBiological Sciences

Genetic Alliance Celebrates its 25th Year

https://doi.org/10.59350/9h2dm-p5y94

Published March 3, 2011

Author Stephen Turner

Genetic Alliance is a nonprofit health advocacy organization that improves health through the authentic engagement of communities and individuals. This year, they are celebrating their 25th anniversary, and they're hosting a variety of events throughout the year, including monthly salons around the country and the 25th Anniversary Annual Conference in June.

ProductivityRSoftwareBiological Sciences

RStudio: New free IDE for R

https://doi.org/10.59350/9zfjs-86v77

Published February 28, 2011

Author Stephen Turner

Just saw the announcement of the availability of Rstudio, a new (free & open source) integrated development environment for R that works on Windows, Mac, and Linux. Judging from the screenshots, it looks like Rstudio supports syntax highlighting for Sweave &

Machine LearningRBiological Sciences

Split a Data Frame into Testing and Training Sets in R

https://doi.org/10.59350/95xka-jxb94

Published February 24, 2011

Author Stephen Turner

I recently analyzed some data trying to find a model that would explain body fat distribution as predicted by several blood biomarkers. I had more predictors than samples (p>n), and I didn't have a clue which variables, interactions, or quadratic terms made biological sense to put into a model.

RRecommended ReadingSearchStatisticsBiological Sciences

Get all your Questions Answered

https://doi.org/10.59350/hh728-shc15

Published February 22, 2011

Author Stephen Turner

When I have a question I usually ask the internet before bugging my neighbor.

RBiological Sciences

R: Given column name in a Data Frame, Get the Index

https://doi.org/10.59350/je8h1-1fp82

Published February 17, 2011

Author Stephen Turner

Had a mental block today trying to figure out how to get the indices of columns in a data frame given their names. Simple task but difficult to search Google for an answer. Thanks to jashapiro, Matt, and Vince for giving me a heads up on the which() function. The which() function returns the indices of TRUE values in a logical vector. If you're looking at the iris data: data(iris) head(iris)

RBiological Sciences

Summarize Missing Data for all Variables in a Data Frame in R

https://doi.org/10.59350/my7ne-fph65

Published February 16, 2011

Author Stephen Turner

Something like this probably already exists in an R package somewhere out there, but I needed a function to summarize how much missing data I have in each variable of a data frame in R. Pass a data frame to this function and for each variable it'll give you the number of missing values, the total N, and the proportion missing.

ProductivityBiological Sciences

Results from Reference Management Poll

https://doi.org/10.59350/jq6v6-zzz18

Published February 15, 2011

Author Stephen Turner

A while back I asked you what reference management software you used, and how well you liked it. I received 180 responses, and here's what you said. Out of the choices on the poll, most of you used Mendeley (30%), followed by EndNote (23%) and Zotero (15%). Out of those of you who picked "other," it was mostly Papers or Qiqqa. There were even a few brave souls managing references caveman-style, manually.

Biological Sciences

Shellfish for Parallel PCA on GWAS data (Alternative to Eigenstrat)

https://doi.org/10.59350/s8yjz-w8e68

Published February 11, 2011

Author Stephen Turner

Recently I tried compiling Eigensoft on my Ubuntu 10.10 Linux system running in Virtualbox and had no success. From comments on this blog post, it looks like the newer Ubuntu distros don't have the libg2c0 and related libraries (which were a part of the gcc3) and gcc4 uses gfortran instead.

Getting Genetics Done

Its like Photoshop for Graphs

Forest plots using R and ggplot2

Splitting a Dataset Revisited: Keeping Covariates Balanced Between Splits

Genetic Alliance Celebrates its 25th Year

RStudio: New free IDE for R

Split a Data Frame into Testing and Training Sets in R

Get all your Questions Answered

R: Given column name in a Data Frame, Get the Index

Summarize Missing Data for all Variables in a Data Frame in R

Results from Reference Management Poll

Shellfish for Parallel PCA on GWAS data (Alternative to Eigenstrat)