Rogue Scholar

PythonBiologieEnglisch

uv, part 2: building and publishing packages

Veröffentlicht 24. März 2025

Autor Stephen Turner

This is part 2 of a series on uv. Other posts in this series: uv, part 1: running scripts and tools This post uv, part 3: Python in R with reticulate Coming soon… Last year I wrote a post on creating a Python command line application with Click using a cookiecutter template, building with setuptools and publishing with twine.

PapersBiologieEnglisch

Weekly Recap (March 2025, part 1)

https://doi.org/10.59350/7wyhw-a5141

Veröffentlicht 21. März 2025

Autor Stephen Turner

Here we are most of the way through March and I’m just getting around to my first “weekly” recap. It’s been a busy month — I’m writing a few papers of my own which I’ll share here when they’re published, and I took some much needed R&R in Portugal where I traded my stack of research papers for some escapist sci-fi. But I’m back now and making my way through a deep backlog.

TILAIBiologieEnglisch

GUIs for Local LLMs with RAG

https://doi.org/10.59350/zr8m8-6mt66

Veröffentlicht 14. März 2025

Autor Stephen Turner

Anyone reading this newsletter has surely used the frontier models like ChatGPT, Claude, and Gemini. I’ve written a few posts about using local models but haven’t really talked much about the tools I use to directly interact with these models. Those previous posts interact with local models using tools like ellmer in R or my own biorecap package which interacts with a locally running Ollama server.

PythonBiologieEnglisch

uv, part 1: running scripts and tools

https://doi.org/10.59350/wsspr-d7d78

Veröffentlicht 3. März 2025

Autor Stephen Turner

This is part 1 of a series on uv. Other posts in this series: This post uv, part 2: building and publishing packages uv, part 3: Python in R with reticulate Coming soon… Lately I’ve heard a lot great things about uv, an extremely fast Python package and project manager, written in Rust.

PapersBiologieEnglisch

Weekly Recap (Feb 2025, part 3)

https://doi.org/10.59350/1r432-pfm44

Veröffentlicht 27. Februar 2025

Autor Stephen Turner

This week’s recap highlights Verkko2 for T2T genome assembly, the GPN-MSA DNA language model trained on multispecies alignments for variant effect prediction that outperforms other methods like CADD, ESM-1b, phyloP, phastCons, nucleotide transformer, and HyenaDNA, fast orthology inference with FastOMA, a foundation model of transcription across cell types, and engineering CRISPR-Cas PAM sites using deep learning.

PapersBiologieEnglisch

Weekly Recap (Feb 2025, part 2)

https://doi.org/10.59350/zgaw8-kew63

Veröffentlicht 20. Februar 2025

Autor Stephen Turner

It’s been a few weeks since I wrote a recap about what I’m reading. It’s been difficult watching helplessly as the institutions and financial infrastructure underpinning my profession are being systematically and irreversibly dismantled, with brilliant scientists I know personally having their careers destroyed and lives upturned.

R TILBiologieEnglisch

Exploring the bioRxiv API with R, httr2, rvest, tidytext, and Datawrapper

https://doi.org/10.59350/wztpw-wey45

Veröffentlicht 10. Februar 2025

Autor Stephen Turner

Last year I wrote a post describing an R package I put together that fetches recent bioRxiv preprints from a given subject and summarizes them in a couple of sentences using a local LLM running through Ollama: That tool has a limitation in that it’s using the bioRxiv RSS feed to pull recent paper titles and abstracts, and the RSS feeds currently only provide the 30 most recent preprints in each subject area.

PapersBiologieEnglisch

Weekly Recap (Feb 2025, part 1)

https://doi.org/10.59350/2k420-s3t63

Veröffentlicht 7. Februar 2025

Autor Stephen Turner

We’re 6 weeks into the new year and I’m still catching up on papers from my late 2024 backlog.

BiologieEnglisch

Technical blogging for growth and learning

https://doi.org/10.59350/bqnfd-7p249

Veröffentlicht 25. Januar 2025

Autor Stephen Turner

I’ve been blogging about genetics, statistics, computational biology, data science, and science in general for over 15 years. I published my first blog post on Getting Genetics Done in 2009, and started this blog last year after taking a few years off. Science blogging has significantly contributed to my personal and professional growth as a scientist. Writing takes time and effort — time I could have spent elsewhere.

PapersBiologieEnglisch

Weekly Recap (Jan 2025 part 3)

https://doi.org/10.59350/jpeem-e5849

Veröffentlicht 24. Januar 2025

Autor Stephen Turner

I'm still catching up on papers from my late 2024 backlog. This week’s recap highlights a browser application for visualizing pathogen dispersal, a DNA language model evaluation benchmark on regulatory DNA, regularized ensemble polygenic risk prediction with GWAS summary statistics, multimodal analysis of RNA-seq data for complex trait genetics, and a deep dive on blastp’s E-value.

AIBiologieEnglisch

AI in data science education

https://doi.org/10.59350/gp7b7-zw139

Veröffentlicht 15. Januar 2025

Autor Stephen Turner

Something a little different for this week’s recap. I’ve been thinking a lot lately about the practice of data science education in this era of widely available (and really good!) LLMs for code. Commentary at the top based on my own data science teaching experience, with a deep dive into a few recent papers below.

Paired Ends

uv, part 2: building and publishing packages

Weekly Recap (March 2025, part 1)

GUIs for Local LLMs with RAG

uv, part 1: running scripts and tools

Weekly Recap (Feb 2025, part 3)

Weekly Recap (Feb 2025, part 2)

Exploring the bioRxiv API with R, httr2, rvest, tidytext, and Datawrapper

Weekly Recap (Feb 2025, part 1)

Technical blogging for growth and learning

Weekly Recap (Jan 2025 part 3)

AI in data science education