
Reposted from the original at https://blog.stephenturner.us/p/bluesky-analysis-claude-llama-tidyverse.
Reposted from the original at https://blog.stephenturner.us/p/bluesky-analysis-claude-llama-tidyverse.
Reposted from Paired Ends at https://blog.stephenturner.us/p/llm-translate-documentation. ---The lang package overrides the ? and help() functions in your R session. The translated help page will appear in the help pane in RStudio or Positron. It can also translate your Roxygen documentation.
Reposted from the original at https://blog.stephenturner.us/p/github-repo-to-text-for-llm-input. --- If you use ChatGPT, Claude, or even some local model through Ollama or HuggingFace Assistants, you’ll know that the chat interface makes it challenging to feed in an entire repo like a Python or R package, because functions, tests, etc. can be scattered across many files throughout a repo.
Reposted from https://blog.stephenturner.us/p/tech-im-thankful-for-2024 Data science and bioinformatics tech I'm thankful for in 2024: tidyverse, RStudio, Positron, Bluesky, blogs, Quarto, bioRxiv, LLMs for code, Ollama, Seqera Containers, StackOverflow, ...It’s a short week here in the US. As I reflect on the tools that shape modern bioinformatics and data science it’s striking to see how far we’ve come in the 20 years I’ve
This is reposted from the original at https://blog.stephenturner.us/p/expand-your-bluesky-network-with-r. ---I’m encouraging everyone I know online to join the scientific community on Bluesky.Bluesky for Science Stephen Turner·Nov 16Read full storyIn that post I link to several starter packs — lists of accounts posting about a topic that you can follow individually or all at once to start filling out your network.I started
Reposted from the original at https://blog.stephenturner.us/p/python-cli-click-cookiecutter. --- In the spirit of Learning in Public, I wanted an excuse to explore (1) click for creating command line interfaces, (2) Cookiecutter project templates, and (3) modern tools in the Python packaging ecosystem.
This is reposted from the original at https://blog.stephenturner.us/p/python-for-r-users. ---A Google search for “R vs Python” returns thousands of hits across sites like Reddit, IBM, Datacamp, Coursera, Kaggle, and many others.
This is reposted from the original at https://blog.stephenturner.us/p/use-nanoparquet-instead-of-readr-csv.Parquet is interoperable between Python and R, fast to read+write, works well with databases, and stores complex data types (e.g., tibble listcols). Use it instead of CSV. Many pros, few (no?) cons. Yesterday I wrote about base R vs. dplyr vs. duckdb for a simple summary analysis.
Reposted from https://blog.stephenturner.us/p/duckdb-vs-dplyr-vs-base-r. TL;DR : For a very simple analysis (means by group on 100M rows), duckdb was 125x faster than base R, and 28x faster than readr+dplyr, without having to read data from disk into memory. The duckplyr package wraps DuckDB's analytical query processing techniques in a dplyr-compatible API.
This blog has moved. This is reposted from Paired Ends: https://blog.stephenturner.us/p/create-a-free-llama-405b-llm-chatbot-github-repo-huggingface Llama 3.1 405B is the first open-source LLM on par with frontier models GPT-4o and Claude 3.5 Sonnet.
This blog has moved. This is reposted from Paired Ends: https://blog.stephenturner.us/p/planes-plausibility-analysis-of-epidemiological-signals-rplanes-r-package PLANES provides a set of methods for evaluating the plausibility of epidemiological signals and forecasts.