Rogue Scholar

Ggplot2RBiological Sciences

R + ggplot2 Graph Catalog

Published February 3, 2015

Joanna Zhao’s and Jenny Bryan’s R graph catalog is meant to be a complement to the physical book, Creating More Effective Graphs, but it’s a really nice gallery in its own right. The catalog shows a series of different data visualizations, all made with R and ggplot2. Click on any of the plots and you get the R code necessary to generate the data and produce the plot.

Noteworthy BlogsBiological Sciences

Microbiome Digest Blog

https://doi.org/10.59350/ezw0k-gt934

Published January 20, 2015

Author Stephen Turner

I have a noteworthy blogs tag on this blog that I sort of forgot about, and haven't used in years. But I started reading one recently that's definitely qualified for the distinction. The Microbiome Digest is written by Elisabeth Bik, a scientist studying the microbiome at Stanford.

RBiological Sciences

Using the microbenchmark package to compare the execution time of R expressions

https://doi.org/10.59350/8ryxb-4sq10

Published January 14, 2015

Author Stephen Turner

I recently learned about the microbenchmark package while browsing through Hadley’s advanced R programming book. I’ve done some quick benchmarking using system.time() in a for loop and taking the average, but the microbenchmark function in the microbenchmark package makes this much easier.

RBiological Sciences

Importing Illumina BeadArray data into R

https://doi.org/10.59350/9f7y5-d7754

Published December 8, 2014

Author Stephen Turner

A colleague needed some help getting Illumina BeadArray gene expression data loaded into R for data analysis with limma. Hopefully whoever ran your arrays can export the data as text files formatted as described in the code below. If so, you can import those text files directly using the beadarray package.

RRNA-SeqTutorialsBiological Sciences

RNA-seq Data Analysis Course Materials

https://doi.org/10.59350/avz01-j3p68

Published November 20, 2014

Author Stephen Turner

Last week I ran a one-day workshop on RNA-seq data analysis in the UVA Health Sciences Library. I set up an AWS public EC2 image with all the necessary software installed.

LinuxQuicktipBiological Sciences

Operate on the body of a file but not the header

https://doi.org/10.59350/1g5zw-fqs69

Published October 14, 2014

Author Stephen Turner

Sometimes you need to run some UNIX command on a file but only want to operate on the body of the file, not the header. Create a file called body somewhere in your $PATH, make it executable, and add this to it: #!/bin/bash IFS= read -r header printf '%s\n' "$header" eval $@ Now, when you need to run something but ignore the header, use the body command first.

RStatisticsBiological Sciences

R package to convert statistical analysis objects to tidy data frames

https://doi.org/10.59350/drvqc-8pz69

Published September 16, 2014

Author Stephen Turner

I talked a little bit about tidy data my recent post about dplyr, but you should really go check out Hadley’s paper on the subject. R expects inputs to data analysis procedures to be in a tidy format, but the model output objects that you get back aren’t always tidy. The reshape2, tidyr, and dplyr are meant to take data frames, munge them around, and return a data frame.

RBiological Sciences

UVA / Charlottesville R Meetup

https://doi.org/10.59350/3n6tp-r5209

Published September 11, 2014

Author Stephen Turner

TL;DR? We started an R Users group, awesome community, huge turnout at first meeting, lots of potential. --- I've sat through many hours of meetings where faculty lament the fact that their trainees (and the faculty themselves!) are woefully ill-prepared for our brave new world of computing- and data-intensive science.

LinuxBiological Sciences

GNU datamash

https://doi.org/10.59350/wz0wk-83m36

Published September 10, 2014

Author Stephen Turner

GNU datamash is a command-line utility that offers simple calculations (e.g. count, sum, min, max, mean, stdev, string coalescing) as well as a rich set of statistical functions, to quickly assess information in textual input files or from a UNIX pipe.

Data ScienceRTutorialsBiological Sciences

Do your "data janitor work" like a boss with dplyr

https://doi.org/10.59350/7ez9c-r9c09

Published August 20, 2014

Author Stephen Turner

Data “janitor-work” The New York Times recently ran a piece on wrangling and cleaning data: “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights” Whether you call it “janitor-work,” wrangling/munging, cleaning/cleansing/scrubbing, tidying, or something else, the article above is worth a read (even though it implicitly denigrates the important work that your housekeeping staff does). It’s one of the few “Big Data” pieces that

BioinformaticsRRNA-SeqTutorialsBiological Sciences

Introduction to R for Life Scientists: Course Materials

https://doi.org/10.59350/rhr57-chq53

Published July 7, 2014

Author Stephen Turner

Last week I taught a three-hour introduction to R workshop for life scientists at UVA's Health Sciences Library. I broke the workshop into three sections: In the first half hour or so I presented slides giving an overview of R and why R is so awesome. During this session I emphasized reproducible research and gave a demonstration of using knitr + rmarkdown in RStudio to produce a PDF that can easily be recompiled when data updates.

Getting Genetics Done

R + ggplot2 Graph Catalog

Microbiome Digest Blog

Using the microbenchmark package to compare the execution time of R expressions

Importing Illumina BeadArray data into R

RNA-seq Data Analysis Course Materials

Operate on the body of a file but not the header

R package to convert statistical analysis objects to tidy data frames

UVA / Charlottesville R Meetup

GNU datamash

Do your "data janitor work" like a boss with dplyr

Introduction to R for Life Scientists: Course Materials