
At our most recent R user group meeting we were delighted to have presentations from Mark Lawson and Steve Hoang, both bioinformaticians at Hemoshear. All of the code used in both demos is in our Meetup’s GitHub repo.
At our most recent R user group meeting we were delighted to have presentations from Mark Lawson and Steve Hoang, both bioinformaticians at Hemoshear. All of the code used in both demos is in our Meetup’s GitHub repo.
Anscombe’s quartet comprises four datasets that have nearly identical simple statistical properties, yet appear very different when graphed. Each dataset consists of eleven ( x , y ) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties. Let’s load and view the data.
Current Opinion in Microbiology recently published a special issue in genomics. In an excellent editorial overview, “Genomics: The era of genomically-enabled microbiology”, Neil Hall and Jay Hinton give an overview of the state of the field in microbial genomics, summarize recent contributions, and give a great synopsis of each of the reviews in this issue.
Joanna Zhao’s and Jenny Bryan’s R graph catalog is meant to be a complement to the physical book, Creating More Effective Graphs, but it’s a really nice gallery in its own right. The catalog shows a series of different data visualizations, all made with R and ggplot2. Click on any of the plots and you get the R code necessary to generate the data and produce the plot.
I have a noteworthy blogs tag on this blog that I sort of forgot about, and haven't used in years. But I started reading one recently that's definitely qualified for the distinction. The Microbiome Digest is written by Elisabeth Bik, a scientist studying the microbiome at Stanford.
I recently learned about the microbenchmark package while browsing through Hadley’s advanced R programming book. I’ve done some quick benchmarking using system.time() in a for loop and taking the average, but the microbenchmark function in the microbenchmark package makes this much easier.
A colleague needed some help getting Illumina BeadArray gene expression data loaded into R for data analysis with limma. Hopefully whoever ran your arrays can export the data as text files formatted as described in the code below. If so, you can import those text files directly using the beadarray package.
Last week I ran a one-day workshop on RNA-seq data analysis in the UVA Health Sciences Library. I set up an AWS public EC2 image with all the necessary software installed.
Sometimes you need to run some UNIX command on a file but only want to operate on the body of the file, not the header. Create a file called body somewhere in your $PATH, make it executable, and add this to it: #!/bin/bash IFS= read -r header printf '%s\n' "$header" eval $@ Now, when you need to run something but ignore the header, use the body command first.
I talked a little bit about tidy data my recent post about dplyr, but you should really go check out Hadley’s paper on the subject. R expects inputs to data analysis procedures to be in a tidy format, but the model output objects that you get back aren’t always tidy. The reshape2, tidyr, and dplyr are meant to take data frames, munge them around, and return a data frame.
TL;DR? We started an R Users group, awesome community, huge turnout at first meeting, lots of potential. --- I've sat through many hours of meetings where faculty lament the fact that their trainees (and the faculty themselves!) are woefully ill-prepared for our brave new world of computing- and data-intensive science.