Image processing is one of the core focus areas of rOpenSci. Over the last few months we have released several major upgrades to core packages in our imaging suite, including magick, tesseract, and av. This post highlights a few cool new features.
Image processing is one of the core focus areas of rOpenSci. Over the last few months we have released several major upgrades to core packages in our imaging suite, including magick, tesseract, and av. This post highlights a few cool new features.
citecorp is a new (hit CRAN in late August) R package for working with data from theOpenCitations Corpus (OCC).OpenCitations, run by David Shotton and Silvio Peroni,houses the OCC, an open repository of scholarly citation dataunder the very open CC0 license.
The UCSC Xena platform provides an unprecedented resource for public omics data from big projects like The Cancer Genome Atlas (TCGA), however, it is hardfor users to incorporate multiple datasets or data types, integrate the selected data withpopular analysis tools or homebrewed code, and reproduce analysis procedures.
🔗Teaching collaborative software development In the University of British Columbia’s Master of Data Science program one of the courses we teach is called Collaborative Software Development, DSCI 524. In this course we focus on teaching how to exploit practices from collaborative software development techniques in data scientific workflows.
The free online book Open Forensic Science in R was created to foster open science practices in the forensic science community.
🔗rOpenSci HQ rOpenSci received a $678K award from the Sloan Foundation to expand Software Peer Review.We are hiring for a new position in statistical software testing and peer review.Join our next Community Call on Reproducible Workflows at Scale with drake September 24th.Videos, speakers’ slides, resources and collaborative notes from our Community Calls on Involving Multilingual Communities and Reproducible Research with R are posted.
🔗Introduction The availability of large quantities of freely available data is revolutionizing the world of ecological research. Open data maximizes the opportunities to perform comparative analyses and meta-analyses. Such synthesis efforts will increasingly exploit “population data”, which we define here as time series of population abundance.
Ambitious workflows in R, such as machine learning analyses, can be difficult to manage. A single round of computation can take several hours to complete, and routine updates to the code and data tend to invalidate hard-earned results. You can enhance the maintainability, hygiene, speed, scale, and reproducibility of such projects with the drake R package.
Are you passionate about statistical methods and software? If so we would love for you to join our team to dig deep into the world of statistical software packages. You’ll develop standards for evaluating and reviewing statistical tools, publish, and work closely with an international team of experts to set up a new software review system.
The grainchanger package provides functionality for data aggregation to a coarser resolution via moving-window or direct methods. 🔗Why do we need new methods for data aggregation? As landscape ecologists and macroecologists, we often need to aggregate data in order to harmonise datasets. In doing so, we often lose a lot of information about the spatial structure and environmental heterogeneity of data measured at finer resolution.
We’re delighted to announce that we have received new funding from the Alfred P. Sloan Foundation. The $678K grant, awarded through the Foundation’s Data & Computational Research program, will be used to expand our efforts in software peer review.