This is reposted from the original at https://blog.stephenturner.us/p/python-for-r-users. ---A Google search for “R vs Python” returns thousands of hits across sites like Reddit, IBM, Datacamp, Coursera, Kaggle, and many others.
This is reposted from the original at https://blog.stephenturner.us/p/python-for-r-users. ---A Google search for “R vs Python” returns thousands of hits across sites like Reddit, IBM, Datacamp, Coursera, Kaggle, and many others.
This is reposted from the original at https://blog.stephenturner.us/p/use-nanoparquet-instead-of-readr-csv.Parquet is interoperable between Python and R, fast to read+write, works well with databases, and stores complex data types (e.g., tibble listcols). Use it instead of CSV. Many pros, few (no?) cons. Yesterday I wrote about base R vs. dplyr vs. duckdb for a simple summary analysis.
Reposted from https://blog.stephenturner.us/p/duckdb-vs-dplyr-vs-base-r. TL;DR : For a very simple analysis (means by group on 100M rows), duckdb was 125x faster than base R, and 28x faster than readr+dplyr, without having to read data from disk into memory. The duckplyr package wraps DuckDB's analytical query processing techniques in a dplyr-compatible API.
This blog has moved. This is reposted from Paired Ends: https://blog.stephenturner.us/p/create-a-free-llama-405b-llm-chatbot-github-repo-huggingface Llama 3.1 405B is the first open-source LLM on par with frontier models GPT-4o and Claude 3.5 Sonnet.
This blog has moved. This is reposted from Paired Ends: https://blog.stephenturner.us/p/planes-plausibility-analysis-of-epidemiological-signals-rplanes-r-package PLANES provides a set of methods for evaluating the plausibility of epidemiological signals and forecasts.
This is re-posted from my newsletter, where I'll be posting from now on: https://blog.stephenturner.us/p/biorecap-r-package-for-summarizing-biorxiv-preprints-local-llm ---TL;DR I wrote an R package that summarizes recent bioRxiv preprints using a locally running LLM via Ollama+ollamar, and produces a summary HTML report from a parameterized RMarkdown template.
This is reposted from the original article:
My new blog/newsletter ("Paired Ends") is now at blog.stephenturner.us. I'll be posting semi-regular updates and literature highlights in bioinformatics, computational biology, and data science, along with the occasional post on programming.
A while back I wrote this post about how I stay current in bioinformatics & genomics. That was nearly five years ago . A lot has changed since then. A few links are dead.
The first ever RStudio conference was held January 11-14, 2017 in Orlando, FL. For anyone else like me who spends hours each working day staring into an RStudio session, the conference was truly excellent . The speaker lineup was diverse and covered lots of areas related to development in R, including the tidyverse, the RStudio IDE, Shiny, htmlwidgets, and authoring with RMarkdown.
I recently stumbled across this collection of computational biology primers in Nature Biotechnology. Many of these are old, but they're still great resources to get a fundamental understanding of the topic. Here they are in no particular order.