Biological SciencesBlogger

Getting Genetics Done

Getting Things Done in Genetics & Bioinformatics Research
Home Page
language
RBiological Sciences
Published

Reposted from the original at https://blog.stephenturner.us/p/github-repo-to-text-for-llm-input. ---  If you use ChatGPT, Claude, or even some local model through Ollama or HuggingFace Assistants, you’ll know that the chat interface makes it challenging to feed in an entire repo like a Python or R package, because functions, tests, etc. can be scattered across many files throughout a repo.

RBiological Sciences
Published

Reposted from https://blog.stephenturner.us/p/tech-im-thankful-for-2024 Data science and bioinformatics tech I'm thankful for in 2024: tidyverse, RStudio, Positron, Bluesky, blogs, Quarto, bioRxiv, LLMs for code, Ollama, Seqera Containers, StackOverflow, ...It’s a short week here in the US. As I reflect on the tools that shape modern bioinformatics and data science it’s striking to see how far we’ve come in the 20 years I’ve

RBiological Sciences
Published

This is reposted from the original at https://blog.stephenturner.us/p/expand-your-bluesky-network-with-r. ---I’m encouraging everyone I know online to join the scientific community on Bluesky.Bluesky for Science Stephen Turner·Nov 16Read full storyIn that post I link to several starter packs — lists of accounts posting about a topic that you can follow individually or all at once to start filling out your network.I started

RBiological Sciences
Published
Author Stephen Turner

This is reposted from the original at https://blog.stephenturner.us/p/use-nanoparquet-instead-of-readr-csv.Parquet is interoperable between Python and R, fast to read+write, works well with databases, and stores complex data types (e.g., tibble listcols). Use it instead of CSV. Many pros, few (no?) cons. Yesterday I wrote about base R vs. dplyr vs. duckdb for a simple summary analysis.

RBiological Sciences
Published
Author Stephen Turner

Reposted from https://blog.stephenturner.us/p/duckdb-vs-dplyr-vs-base-r. TL;DR : For a very simple analysis (means by group on 100M rows), duckdb was 125x faster than base R, and 28x faster than readr+dplyr, without having to read data from disk into memory. The duckplyr package wraps DuckDB's analytical query processing techniques in a dplyr-compatible API.

RBiological Sciences
Published
Author Stephen Turner

This is re-posted from my newsletter, where I'll be posting from now on: https://blog.stephenturner.us/p/biorecap-r-package-for-summarizing-biorxiv-preprints-local-llm ---TL;DR I wrote an R package that summarizes recent bioRxiv preprints using a locally running LLM via Ollama+ollamar, and produces a summary HTML report from a parameterized RMarkdown template.