Biological SciencesBlogger

Getting Genetics Done

Getting Things Done in Genetics & Bioinformatics Research
Home Page
language
RBiological Sciences
Published
Author Stephen Turner

Reposted from Paired Ends at https://blog.stephenturner.us/p/llm-translate-documentation. ---The lang package overrides the ? and help() functions in your R session. The translated help page will appear in the help pane in RStudio or Positron. It can also translate your Roxygen documentation.

RBiological Sciences
Published
Author Stephen Turner

Reposted from the original at https://blog.stephenturner.us/p/github-repo-to-text-for-llm-input. ---  If you use ChatGPT, Claude, or even some local model through Ollama or HuggingFace Assistants, you’ll know that the chat interface makes it challenging to feed in an entire repo like a Python or R package, because functions, tests, etc. can be scattered across many files throughout a repo.

RBiological Sciences
Published
Author Stephen Turner

Reposted from https://blog.stephenturner.us/p/tech-im-thankful-for-2024 Data science and bioinformatics tech I'm thankful for in 2024: tidyverse, RStudio, Positron, Bluesky, blogs, Quarto, bioRxiv, LLMs for code, Ollama, Seqera Containers, StackOverflow, ...It’s a short week here in the US. As I reflect on the tools that shape modern bioinformatics and data science it’s striking to see how far we’ve come in the 20 years I’ve

RBiological Sciences
Published
Author Stephen Turner

This is reposted from the original at https://blog.stephenturner.us/p/expand-your-bluesky-network-with-r. ---I’m encouraging everyone I know online to join the scientific community on Bluesky.Bluesky for Science Stephen Turner·Nov 16Read full storyIn that post I link to several starter packs — lists of accounts posting about a topic that you can follow individually or all at once to start filling out your network.I started

RBiological Sciences
Published
Author Stephen Turner

Reposted from the original at https://blog.stephenturner.us/p/python-cli-click-cookiecutter. --- In the spirit of Learning in Public, I wanted an excuse to explore (1) click for creating command line interfaces, (2) Cookiecutter project templates, and (3) modern tools in the Python packaging ecosystem.

RBiological Sciences
Published
Author Stephen Turner

This is reposted from the original at https://blog.stephenturner.us/p/python-for-r-users. ---A Google search for “R vs Python” returns thousands of hits across sites like Reddit, IBM, Datacamp, Coursera, Kaggle, and many others.

RBiological Sciences
Published
Author Stephen Turner

This is reposted from the original at https://blog.stephenturner.us/p/use-nanoparquet-instead-of-readr-csv.Parquet is interoperable between Python and R, fast to read+write, works well with databases, and stores complex data types (e.g., tibble listcols). Use it instead of CSV. Many pros, few (no?) cons. Yesterday I wrote about base R vs. dplyr vs. duckdb for a simple summary analysis.

RBiological Sciences
Published
Author Stephen Turner

Reposted from https://blog.stephenturner.us/p/duckdb-vs-dplyr-vs-base-r. TL;DR : For a very simple analysis (means by group on 100M rows), duckdb was 125x faster than base R, and 28x faster than readr+dplyr, without having to read data from disk into memory. The duckplyr package wraps DuckDB's analytical query processing techniques in a dplyr-compatible API.