Rogue Scholar

PapersBiologieAnglais

Weekly Recap (Nov 2024, part 2)

Publié 15 novembre 2024

Auteur Stephen Turner

This week’s recap highlights an AI agent for automated multi-omic analysis (AutoBA), rapid species-level metagenome profiling and containment (sylph), a review on genome-wide association analysis beyond SNPs, private information leakage from scRNA-seq count matrices, and a method to “unlearn” viral knowledge in protein language models as a means to develop safe PLM-based variant effect analysis (PROEDIT). Others that caught my attention include

TILPythonBiologieAnglais

Build a Python CLI with Click+Cookiecutter

https://doi.org/10.59350/zsq9k-6xs17

Publié 10 novembre 2024

Auteur Stephen Turner

In the spirit of Learning in Public, I wanted an excuse to explore (1) click for creating command line interfaces, (2) Cookiecutter project templates, and (3) modern tools in the Python packaging ecosystem. If you’re primarily an R developer like me, I recently wrote about resources for getting better at Python for R users.

PapersBiologieAnglais

Weekly Recap (Nov 2024, part 1)

https://doi.org/10.59350/4a13p-nky04

Publié 8 novembre 2024

Auteur Stephen Turner

This week's recap highlights a new pipeline for metagenome quality assessment and taxonomic annotation (MAGFlow &

NextflowBiologieAnglais

Nextflow Summit Barcelona 2024

https://doi.org/10.59350/xkn7n-zep24

Publié 4 novembre 2024

Auteur Stephen Turner

I just returned from a week in Barcelona where I attended the Nextflow Summit and nf-core hackathon, and I can hardly contain my excitement for the near term future of bioinformatics, computational biology, and open science in general.

PapersBiologieAnglais

Weekly Recap (Oct 2024, part 4)

https://doi.org/10.59350/xhh6j-z5b67

Publié 25 octobre 2024

Auteur Stephen Turner

This week’s recap highlights protein design with RoseTTAFold, surveillance with wastewater sequencing, T2T human genomes, Vitessce for visualization of multimodal spatial single-cell data, and Taxometer for taxonomic classification of metagenomics contigs.

R PythonBiologieAnglais

Python for R users

https://doi.org/10.59350/nw1ga-79906

Publié 21 octobre 2024

Auteur Stephen Turner

A Google search for “R vs Python” returns thousands of hits across sites like Reddit, IBM, Datacamp, Coursera, Kaggle, and many others.

PapersBiologieAnglais

Weekly Recap (Oct 2024, part 3)

https://doi.org/10.59350/dy8w5-g3p74

Publié 18 octobre 2024

Auteur Stephen Turner

This week’s recap highlights a new Nextflow workflow for calculating polygenic scores with adjustments for genetic ancestry, a paper demonstrating that whole exome plus imputation on more samples is more powerful than whole genome sequencing for finding more trait associated variants, a new deep-learning-based splice site predictor that improves spliced alignments, a new method for accurate community profiling of large metagenomic datasets, and

PapersAIBiologieAnglais

Inciteful+Zotero to find relevant literature

https://doi.org/10.59350/p027a-93g67

Publié 14 octobre 2024

Auteur Stephen Turner

I am in the middle of writing a review / perspectives paper. One that I’m confident will be exciting once we get it published. Some sections of the review cover subject matter at the outer periphery of my expertise.

PapersBiologieAnglais

Weekly Recap (Oct 2024, part 2)

https://doi.org/10.59350/fxnyy-wng24

Publié 11 octobre 2024

Auteur Stephen Turner

This week’s recap highlights a new method for gene-level alignment of single-cell trajectories, an R package for integrating gene and protein identifiers across biological sequence databases, characterization of SVs across humans and apes, universal prediction of cellular phenotypes, a method to quantify cell state heritability versus plasticity and infer cell state transition with single cell data, and a new AI-driven, natural language-oriented

R TILBiologieAnglais

Use nanoparquet instead of readr/CSV

https://doi.org/10.59350/mtjg1-vf107

Publié 8 octobre 2024

Auteur Stephen Turner

Yesterday I wrote about base R vs. dplyr vs. duckdb for a simple summary analysis. In that post I simulated 100 million rows of a dataset and wrote to disk as CSV. I then benchmarked how long it took to read in and compute a simple grouped mean. One thing I didn’t do here was separate the time it took to read data into memory (for base R and dplyr) versus computing the actual summary.

R TILBiologieAnglais

DuckDB vs dplyr vs base R

https://doi.org/10.59350/b4eds-n3h83

Publié 7 octobre 2024

Auteur Stephen Turner

TL;DR : For a very simple analysis (means by group on 100M rows), duckdb was 125x faster than base R, and 28x faster than readr+dplyr, without having to read data from disk into memory. The duckplyr package wraps DuckDB's analytical query processing techniques in a dplyr-compatible API. Learn more at duckdb.org/docs/api/r and duckplyr.tidyverse.org. I wanted to see for myself what the fuss was about with DuckDB.

Paired Ends

Weekly Recap (Nov 2024, part 2)

Build a Python CLI with Click+Cookiecutter

Weekly Recap (Nov 2024, part 1)

Nextflow Summit Barcelona 2024

Weekly Recap (Oct 2024, part 4)

Python for R users

Weekly Recap (Oct 2024, part 3)

Inciteful+Zotero to find relevant literature

Weekly Recap (Oct 2024, part 2)

Use nanoparquet instead of readr/CSV

DuckDB vs dplyr vs base R