BiologíaInglésBlogger

Getting Genetics Done

Getting Things Done in Genetics & Bioinformatics Research
Página de inicio
language
RBiologíaInglés
Publicado

Reposted from the original at https://blog.stephenturner.us/p/github-repo-to-text-for-llm-input. ---  If you use ChatGPT, Claude, or even some local model through Ollama or HuggingFace Assistants, you’ll know that the chat interface makes it challenging to feed in an entire repo like a Python or R package, because functions, tests, etc. can be scattered across many files throughout a repo.

RBiologíaInglés
Publicado

Reposted from https://blog.stephenturner.us/p/tech-im-thankful-for-2024 Data science and bioinformatics tech I'm thankful for in 2024: tidyverse, RStudio, Positron, Bluesky, blogs, Quarto, bioRxiv, LLMs for code, Ollama, Seqera Containers, StackOverflow, ...It’s a short week here in the US. As I reflect on the tools that shape modern bioinformatics and data science it’s striking to see how far we’ve come in the 20 years I’ve

RBiologíaInglés
Publicado

This is reposted from the original at https://blog.stephenturner.us/p/expand-your-bluesky-network-with-r. ---I’m encouraging everyone I know online to join the scientific community on Bluesky.Bluesky for Science Stephen Turner·Nov 16Read full storyIn that post I link to several starter packs — lists of accounts posting about a topic that you can follow individually or all at once to start filling out your network.I started

RBiologíaInglés
Publicado
Autor Stephen Turner

This is reposted from the original at https://blog.stephenturner.us/p/use-nanoparquet-instead-of-readr-csv.Parquet is interoperable between Python and R, fast to read+write, works well with databases, and stores complex data types (e.g., tibble listcols). Use it instead of CSV. Many pros, few (no?) cons. Yesterday I wrote about base R vs. dplyr vs. duckdb for a simple summary analysis.

RBiologíaInglés
Publicado
Autor Stephen Turner

Reposted from https://blog.stephenturner.us/p/duckdb-vs-dplyr-vs-base-r. TL;DR : For a very simple analysis (means by group on 100M rows), duckdb was 125x faster than base R, and 28x faster than readr+dplyr, without having to read data from disk into memory. The duckplyr package wraps DuckDB's analytical query processing techniques in a dplyr-compatible API.

RBiologíaInglés
Publicado
Autor Stephen Turner

This is re-posted from my newsletter, where I'll be posting from now on: https://blog.stephenturner.us/p/biorecap-r-package-for-summarizing-biorxiv-preprints-local-llm ---TL;DR I wrote an R package that summarizes recent bioRxiv preprints using a locally running LLM via Ollama+ollamar, and produces a summary HTML report from a parameterized RMarkdown template.