Rogue Scholar

RBiological Sciences

Python for R users (repost)

Published October 22, 2024

Author Stephen Turner

This is reposted from the original at https://blog.stephenturner.us/p/python-for-r-users. ---A Google search for “R vs Python” returns thousands of hits across sites like Reddit, IBM, Datacamp, Coursera, Kaggle, and many others.

RBiological Sciences

Use nanoparquet instead of readr/CSV

https://doi.org/10.59350/cxsdt-7ky66

Published October 8, 2024

Author Stephen Turner

This is reposted from the original at https://blog.stephenturner.us/p/use-nanoparquet-instead-of-readr-csv.Parquet is interoperable between Python and R, fast to read+write, works well with databases, and stores complex data types (e.g., tibble listcols). Use it instead of CSV. Many pros, few (no?) cons. Yesterday I wrote about base R vs. dplyr vs. duckdb for a simple summary analysis.

RBiological Sciences

DuckDB vs dplyr vs base R

https://doi.org/10.59350/2knmq-dqw34

Published October 8, 2024

Author Stephen Turner

Reposted from https://blog.stephenturner.us/p/duckdb-vs-dplyr-vs-base-r. TL;DR : For a very simple analysis (means by group on 100M rows), duckdb was 125x faster than base R, and 28x faster than readr+dplyr, without having to read data from disk into memory. The duckplyr package wraps DuckDB's analytical query processing techniques in a dplyr-compatible API.

RBiological Sciences

Create a free Llama 3.1 405B-powered chatbot on any GitHub repo in 1 minute (cross-posted from Paired Ends)

https://doi.org/10.59350/5hh2d-tnq35

Published September 11, 2024

Author Stephen Turner

This blog has moved. This is reposted from Paired Ends: https://blog.stephenturner.us/p/create-a-free-llama-405b-llm-chatbot-github-repo-huggingface Llama 3.1 405B is the first open-source LLM on par with frontier models GPT-4o and Claude 3.5 Sonnet.

RBiological Sciences

PLANES: Plausibility Analysis of Epidemiological Signals

https://doi.org/10.59350/62v39-a9m88

Published September 3, 2024

Author Stephen Turner

This blog has moved. This is reposted from Paired Ends: https://blog.stephenturner.us/p/planes-plausibility-analysis-of-epidemiological-signals-rplanes-r-package PLANES provides a set of methods for evaluating the plausibility of epidemiological signals and forecasts.

RBiological Sciences

biorecap: an R package for summarizing bioRxiv preprints with a local LLM

https://doi.org/10.59350/a9pcd-7nw25

Published August 24, 2024

Author Stephen Turner

This is re-posted from my newsletter, where I'll be posting from now on: https://blog.stephenturner.us/p/biorecap-r-package-for-summarizing-biorxiv-preprints-local-llm ---TL;DR I wrote an R package that summarizes recent bioRxiv preprints using a locally running LLM via Ollama+ollamar, and produces a summary HTML report from a parameterized RMarkdown template.

RBiological Sciences

Use R to prompt a local LLM with ollamar

https://doi.org/10.59350/trg64-4jr22

Published August 14, 2024

Author Stephen Turner

This is reposted from the original article:

AnnouncementsRRecommended ReadingBiological Sciences

Moving to blog.stephenturner.us (Paired Ends)

https://doi.org/10.59350/cqffk-s8361

Published July 30, 2024

Author Stephen Turner

My new blog/newsletter ("Paired Ends") is now at blog.stephenturner.us. I'll be posting semi-regular updates and literature highlights in bioinformatics, computational biology, and data science, along with the occasional post on programming.

RRecommended ReadingTwitterBiological Sciences

Staying Current in Bioinformatics & Genomics: 2017 Edition

https://doi.org/10.59350/7951a-61m72

Published February 1, 2017

Author Stephen Turner

A while back I wrote this post about how I stay current in bioinformatics & genomics. That was nearly five years ago . A lot has changed since then. A few links are dead.

ConferencesRBiological Sciences

RStudio Conference 2017 Recap

https://doi.org/10.59350/49m7x-11q77

Published January 14, 2017

Author Stephen Turner

The first ever RStudio conference was held January 11-14, 2017 in Orlando, FL. For anyone else like me who spends hours each working day staring into an RStudio session, the conference was truly excellent . The speaker lineup was diverse and covered lots of areas related to development in R, including the tidyverse, the RStudio IDE, Shiny, htmlwidgets, and authoring with RMarkdown.

Primers in computational biology

https://doi.org/10.59350/3sj7t-vtg26

Published September 19, 2016

Author Stephen Turner

I recently stumbled across this collection of computational biology primers in Nature Biotechnology. Many of these are old, but they're still great resources to get a fundamental understanding of the topic. Here they are in no particular order.

Getting Genetics Done

Python for R users (repost)

Use nanoparquet instead of readr/CSV

DuckDB vs dplyr vs base R

Create a free Llama 3.1 405B-powered chatbot on any GitHub repo in 1 minute (cross-posted from Paired Ends)

PLANES: Plausibility Analysis of Epidemiological Signals

biorecap: an R package for summarizing bioRxiv preprints with a local LLM

Use R to prompt a local LLM with ollamar

Moving to blog.stephenturner.us (Paired Ends)

Staying Current in Bioinformatics & Genomics: 2017 Edition

RStudio Conference 2017 Recap

Primers in computational biology