
Closing my browser tabs: papers, blogs, news stories, YouTube videos, tutorials, etc.
Closing my browser tabs: papers, blogs, news stories, YouTube videos, tutorials, etc.
Tablero Infovestigación La investigación digital es un proceso detallado, meticuloso y robusto, caracterizado por utilizar fuentes de información digitales de calidad aplicar métodos, herramientas y técnicas informáticas para recolectar, analizar y utilizar información disponible en línea siguiendo un método que incluya el diseño de preguntas singulares y relevantes, el descubrimiento de datos el análisis y la discusión de los resultados,
This week’s recap highlights analysis of human de novo mutation rates from a four-generation pedigree reference, how LLMs internalize scientific literature and citation practices, the py_ped_sim forward pedigree and genetic simulator for complex family pedigree analysis, and a review on predicting gene expression from DNA sequence using deep learning models like Enformer and Borzoi.
I recently wrote a piece about leaving academia for biotech. I left academia for industry in 2019. I spent four years at a consulting firm before joining Colossal Biosciences. This week I’m returning to the University of Virginia School of Data Science as a tenured associate professor and dean of research. The transition from academia to industry can be tricky, but it’s also increasingly common.
This week’s recap highlights nanoMDBG for metagenome assembly from nanopore reads, the SCassist AI-based workflow for single-cell analysis, discovery and characterization of GxE and GxG effects in a vertebrate model, the PIGEON framework for estimating gene-environment interaction for polygenic traits, and long-read alignment with multi-level parallelism.
This year ISMB2025 and BOSC was in Liverpool, and we embraced the location with our biggest birthday ever (alongside some Beatles themed adventures). Read more in the conference write-up.
I’ve written a lot about Ollama here. Ollama lets you run open-weight models like Llama, Gemma, Mistral, Qwen, DeepSeek, etc. on your own computer. You don’t have to pay for a frontier model like ChatGPT, Claude, or Gemini, and all the inputs and outputs stay on your computer, minimizing any privacy and security concerns. Until recently Ollama was a command-line only tool.
I liked Steve Krouse’s essay, “Vibe code is legacy code.” It helped crystalize some half-baked thoughts I have on vibe coding. Here’s an excerpt.Subscribe now Maintainability and vibe are inversely correlated I’ve been using GitHub copilot and chatbots for code for years, and I’ve written about them a lot here.
The Genomic Standards Consortium (GSC) recently convened for the GSC25 meeting in Cambridge, UK. Bringing together leading researchers, data scientists, and genomics professionals from around the world. Held from July 28-August 1, GSC25 marked a significant milestone – celebrating two decades of advancing genomic data standards while charting the course for the next 20 years.
This week’s recap highlights Variant-EFFECTS for rewriting regulatory DNA to dissect and reprogram gene expression, zero-shot evaluation revealing the limitations of single-cell foundation models, EcoWeaver for large-scale prediction of gene functional associations from coevolutionary signals, and how assemblies of long-read metagenomes suffer from diverse errors.
I have been involved in the open science movement for nearly 20 years now. By now, it seems to me the problems have been clearly recognized and formulated, the experts agree on the necessary technical solutions (replacing the journals) and the funding is available.