Rogue Scholar

Sciences naturellesAnglais

High-Precision Content Classification Using Hierarchy

Publié 15 juillet 2022

Content classification is the most fundamental form of holistic content understanding. It helps make your resources findable (F2) and connects them to other resources (I3). Content understanding represents each piece of content in the index. Relevance of content is a function of query and content understanding. Query understanding represents each search query as a search intent.

Sciences naturellesAnglais

Taxonomy Pruning for Query Classification

https://doi.org/10.59350/21fc5-yam52

Publié 14 juillet 2022

Auteur Donny Winston

When providing a search interface (F4), you can improve precision significantly by classifying a user’s query, assuming you are able to classify your content. If you have a category taxonomy and labeled queries, you can train a classifier in order to dynamically assign a category to a query.

Sciences naturellesAnglais

An Objective Function for Code Refactoring

https://doi.org/10.59350/bn0w1-6e078

Publié 8 juillet 2022

Auteur Donny Winston

Have you ever set an objective function for code refactoring, where, for every proposed total change (e.g. reviewable pull request), you seek to maximize the change in this function? An example: ¹ \ \log_2(pct_{LOC\ tested}) * pct {importables\ documented} * pct {LOC\ nostate} \over n {LOC} \ Good (numerator stuff): Percent Lines of Code (LOC) covered by a test.

Sciences naturellesAnglais

Complexity Is Carbon

https://doi.org/10.59350/43j8b-bbc42

Publié 6 juillet 2022

Auteur Donny Winston

Some energy infrastructure emits carbon. Some data infrastructure emits complexity. There is essential carbon emission, like humans exhaling CO ₂ . And there is incidental, non-essential carbon emission, like humans burning fossil fuels. There is essential complexity in data (and software code), like that pertaining to modeling your subject matter and your application domain.

Sciences naturellesAnglais

These Are All Just Persistent URLs, No?

https://doi.org/10.59350/79h77-2fs87

Publié 5 juillet 2022

Auteur Donny Winston

I am beginning to walk through each question of the FAIR Implementation Profile (FIP) Ontology. My goal is to construct and share a populated model of people’s articulations – aka declarations – of choices they’ve made or with challenges they face with regard to addressing each question, as well as the considerations they associate with any such choice or challenge.

Sciences naturellesAnglais

The ARK System of Persistent Identifiers (PIDs)

https://doi.org/10.59350/vfzqz-hvs88

Publié 1 juillet 2022

Auteur Donny Winston

The Archival Resource Key (ARK) system is an alternative to the Handle system to satisfy FAIR’s F1 Principle. Similar to the Handle system, naming authority for ARKs is distributed by allotting prefixes.

Sciences naturellesAnglais

The Handle System of Persistent Identifiers

https://doi.org/10.59350/mnpnn-a4279

Publié 30 juin 2022

Auteur Donny Winston

The Handle system is a popular choice for the assignment and resolution of globally unique, persistent identifiers. Governance is centralized with the DONA Foundation, and administration is distributed among so-called Credentialed Multi-Primary Administrators (MPAs), of which there are currently nine. You’ve likely heard of at least one MPA: the International DOI Foundation. Each MPA is assigned a number.

Sciences naturellesAnglais

What globally unique, persistent, resolvable identifiers do you use for datasets?

https://doi.org/10.59350/6c6ts-5h009

Publié 29 juin 2022

Auteur Donny Winston

What globally unique, persistent, resolvable identifiers do you use for datasets? I want to know about either (a) a challenge you’re facing, and what you’ve tried; or (2) a choice you recently made, and how it’s going. Context: For each question of the FAIR Implementation Profile (FIP) Ontology, I want to collect and discuss folks’ choices and challenges on my podcast, Machine-Centric Science.

Sciences naturellesAnglais

Findability → Known-Item Search, Discoverability → Exploratory Search?

https://doi.org/10.59350/917v6-y5171

Publié 28 juin 2022

Auteur Donny Winston

I keep confusing findability and discoverability. It seems that findability is often equated to known-item search, and discoverability to exploratory search. Known-item search is compatible with “instant search”, aka search-as-you-type interfaces. Exploratory search is compatible with “autocomplete” (incl. re-spelling, infix matching, synonym substitution, etc.) interfaces.

Sciences naturellesAnglais

Is an Ontology 'better' than a Relational Data Model?

https://doi.org/10.59350/77x21-ep589

Publié 27 juin 2022

Auteur Donny Winston

Is an ontology “better” than a relational data model? “More expressive power” doesn’t always mean “better”. However, ontologies allow you to ratchet up power while keeping logic in data structures.

Sciences naturellesAnglais

Leave Beacons in Code

https://doi.org/10.59350/tkbh2-p5c04

Publié 24 juin 2022

Auteur Donny Winston

Leave beacons in your code. I would have avoided a silly error if a variable named xgb_train_data would have been named, for example, xgb_train_data_filepath instead. When you can’t leave globally unique, persistent, resolvable identifiers (GUPRIs), mind your beacons. References: F. Hermans, The Programmer’s brain: what every programmer needs to know about cognition , pp28-30. Shelter Island, NY: Manning, 2021.