Biological SciencesBlogger

Getting Genetics Done

Getting Things Done in Genetics & Bioinformatics Research
Home Page
language
RBiological Sciences
Published
Author Stephen Turner

Reposted from https://blog.stephenturner.us/p/duckdb-vs-dplyr-vs-base-r. TL;DR : For a very simple analysis (means by group on 100M rows), duckdb was 125x faster than base R, and 28x faster than readr+dplyr, without having to read data from disk into memory. The duckplyr package wraps DuckDB's analytical query processing techniques in a dplyr-compatible API.

RBiological Sciences
Published
Author Stephen Turner

This is re-posted from my newsletter, where I'll be posting from now on: https://blog.stephenturner.us/p/biorecap-r-package-for-summarizing-biorxiv-preprints-local-llm ---TL;DR I wrote an R package that summarizes recent bioRxiv preprints using a locally running LLM via Ollama+ollamar, and produces a summary HTML report from a parameterized RMarkdown template.

AnnouncementsRRecommended ReadingBiological Sciences
Published
Author Stephen Turner

My new blog/newsletter ("Paired Ends") is now at blog.stephenturner.us. I'll be posting semi-regular updates and literature highlights in bioinformatics, computational biology, and data science, along with the occasional post on programming.

ConferencesRBiological Sciences
Published
Author Stephen Turner

The first ever RStudio conference was held January 11-14, 2017 in Orlando, FL. For anyone else like me who spends hours each working day staring into an RStudio session, the conference was truly excellent . The speaker lineup was diverse and covered lots of areas related to development in R, including the tidyverse, the RStudio IDE, Shiny, htmlwidgets, and authoring with RMarkdown.

Recommended ReadingBiological Sciences
Published
Author Stephen Turner

I recently stumbled across this collection of computational biology primers in Nature Biotechnology. Many of these are old, but they're still great resources to get a fundamental understanding of the topic. Here they are in no particular order.

Biological Sciences
Published
Author Stephen Turner

I came across this awesome gist explaining how to syntax highlight code in Keynote. The same trick works for Powerpoint. Mac only. Install homebrew if you don’t have it already and brew install highlight. highlight -O rtf myfile.ext | pbcopy to highlight code to a formatted text converter in RTF output format, and copy the result to the system clipboard. Paste into Keynote or Powerpoint.

RWeb AppsBiological Sciences
Published
Author Stephen Turner

How many reads do I need? What's my sequencing depth? These are common questions I get all the time. Calculating how much sequence data you need to hit a target depth of coverage, or the inverse, what's the coverage depth given a set amount of sequencing, are both easy to answer with some basic algebra. Given one or the other, plus the genome size and read length/configuration, you can calculate either.