Rogue Scholar

AIBiologieAnglais

AI in data science education

Publié 15 janvier 2025

Auteur Stephen Turner

Something a little different for this week’s recap. I’ve been thinking a lot lately about the practice of data science education in this era of widely available (and really good!) LLMs for code. Commentary at the top based on my own data science teaching experience, with a deep dive into a few recent papers below.

TILAIBiologieAnglais

Write code in unfamiliar territory with AI

https://doi.org/10.59350/rwch9-g2z95

Publié 12 janvier 2025

Auteur Stephen Turner

The majority of developers use LLMs to help write code, present company included. When I’m working in languages I know well, they're fantastic at handling the grunt work: generating boilerplate, suggesting completions, and writing tedious tests and documentation.

PapersBiologieAnglais

Weekly Recap (Jan 2025 part 2)

https://doi.org/10.59350/fj15k-d4j12

Publié 10 janvier 2025

Auteur Stephen Turner

I'm still catching up on papers from my late 2024 backlog. This week’s recap highlights autonomous microbial sensors for detecting TNT in soil, genome size estimation from long reads, STABIX for indexing and compressing GWAS summary statistics, and Clair3-RNA for deep learning-based small variant calling on long-read RNA-seq data.

AIBiologieAnglais

Gene Info Custom GPT

https://doi.org/10.59350/a48aw-1sq69

Publié 6 janvier 2025

Auteur Stephen Turner

OpenAI introduced the ability to create custom GPTs back in November 2023. I wanted to try to create one of these, and in the spirit of learning in public this post describes how I made it. But first, what does it do?Gene Info Custom GPT Gene Info custom GPT The Gene Info custom GPT takes a list of human gene symbols as input.

PapersBiologieAnglais

Weekly Recap (Jan 2025, part 1)

https://doi.org/10.59350/2zjt7-tqb76

Publié 3 janvier 2025

Auteur Stephen Turner

Happy New Year! I’m still catching up on papers from my late 2024 backlog.

R AIBiologieAnglais

Bluesky conversation analysis with local and frontier LLMs with R/Tidyverse

https://doi.org/10.59350/rzc7w-qkb06

Publié 30 décembre 2024

Auteur Stephen Turner

Background Bluesky, atrrr, local LLMs I’ve written a few posts lately about Bluesky — first, Bluesky for Science, about Bluesky as a home for Science Twitter expats after the mass eXodus, another on using the atrrr package to expand your Bluesky network. I’ve also spent some time looking at R packages to provide an interface to Ollama.

AIBiologieAnglais

The Enlightenment Conservatory

https://doi.org/10.59350/94aaz-m9h32

Publié 23 décembre 2024

Auteur Stephen Turner

I had good intentions to give NaNoWriMo a try this year but didn’t get very far. Instead I gave OpenAI’s Creative Writing Coach GPT a try for a (very) short story I had in mind, inspired by my frustration trying to access closed-access research articles for a review article I’m preparing.

BiologieAnglais

What I'm reading: de-extinction edition

https://doi.org/10.59350/892gr-q1e17

Publié 21 décembre 2024

Auteur Stephen Turner

The Baader–Meinhof phenomenon (aka the frequency illusion) is the name for that thing that happens when you buy a new car, and suddenly you notice that same model car everywhere you drive.

PapersBiologieAnglais

Weekly Recap (Dec 2024, part 3)

https://doi.org/10.59350/p4rme-k4119

Publié 20 décembre 2024

Auteur Stephen Turner

This week’s recap highlights the Evo model for sequence modeling and design, biomedical discovery with AI agents, improving bioinformatics software quality through teamwork, a new tool from Brent Pedersen and Aaron Quinlan (vcfexpress) for filtering and formatting VCFs with Lua expressions, a new paper about the NHGRI-EBI GWAS Catalog, and a review paper on designing and engineering synthetic genomes.

AIBiologieAnglais

Video to audio to transcript to summary using local AI: whisperfile and llama3.3

https://doi.org/10.59350/bjvsq-cqg11

Publié 18 décembre 2024

Auteur Stephen Turner

A few days ago I wrote about translating R package help documentation using a local LLM (e.g. llama3.x)… …when Mick Watson commented: I was already thinking of wiring up something like this using local AI models — something to summarize podcasts, conference recordings, etc. The relatively new (as of this writing) Gemini 2.0 Flash model will do this for you for YouTube videos. But what if you wanted to do this offline using a local LLM?

TILAIBiologieAnglais

Turn any webpage into markdown for LLM-friendly input

https://doi.org/10.59350/g0y96-dwq81

Publié 16 décembre 2024

Auteur Stephen Turner

Last week I posted about a web app that turns a GitHub repo into a single text file for LLM-friendly input. This is great for capturing LLM-friendly text from a GitHub repo, but what about any other arbitrary website or PDF? I was catching up on Simon Willison’s newsletter reading about an app he made with Claude artifacts that uses the Jina Reader API to generate Markdown from a website. You don’t need to use the API to do this.

Paired Ends

AI in data science education

Write code in unfamiliar territory with AI

Weekly Recap (Jan 2025 part 2)

Gene Info Custom GPT

Weekly Recap (Jan 2025, part 1)

Bluesky conversation analysis with local and frontier LLMs with R/Tidyverse

The Enlightenment Conservatory

What I'm reading: de-extinction edition

Weekly Recap (Dec 2024, part 3)

Video to audio to transcript to summary using local AI: whisperfile and llama3.3

Turn any webpage into markdown for LLM-friendly input