Rogue Scholar

Scienze naturaliInglese

Reliable Data and Knowledge Graphs

Pubblicato 22 maggio 2020

What makes data reliable? A recent article ¹ outlines five properties. It’s “clean” – formatted uniformly, conforming to certain rules/schema, etc.

Scienze naturaliInglese

Process Capability and Data Platforms

https://doi.org/10.59350/kf8ny-0qv84

Pubblicato 19 maggio 2020

Autore Donny Winston

Organizational capabilities can be divided into three categories: resources, processes, and priorities. Resources are what you use to achieve an outcome, processes are how you achieve it, and priorities are why . Understanding capabilities in this way can aid in strategy not only across a large organization but also within units, and even for individuals. ¹ Resources are tangible assets.

Scienze naturaliInglese

Data Discovery and Join-Path Discovery

https://doi.org/10.59350/tr5b4-83g03

Pubblicato 18 maggio 2020

Autore Donny Winston

How do you source data relevant for some analysis? Once you “have” the data, how do you feed it to the analytic task? Traditional enterprise data integration joins paths across a handful of silos for a handful of specific analytic tasks. In data science, however, neither the set of relevant silos nor the set of relevant analytic tasks are both small and well-defined.

Scienze naturaliInglese

Frictionless Linked Data

https://doi.org/10.59350/hr51s-hrv35

Pubblicato 15 maggio 2020

Autore Donny Winston

I was reminded of the importance of approachable, low-barrier-to-entry tools for data management by Monica Granados and Lily Zhao in their presentation of the Frictionless Data toolkit. ¹ They showcased use of a browser-based interface ² for a simple yet valuable task: associating title and description metadata with potentially cryptic column header names in a CSV file, and exporting that metadata together with the raw data

Scienze naturaliInglese

Linting for Lawmakers

https://doi.org/10.59350/g1ezw-7pk08

Pubblicato 14 maggio 2020

Autore Donny Winston

Laws are rules that a particular community recognizes as regulating the actions of its members. From this definition, Serena Peruzzo detailed how she sought to use tools from Natural Language Processing (NLP) to “find a representation of the rules that makes them more accessible and understandable.” ¹ One proposed use case is to identify and highlight ambiguities.

Scienze naturaliInglese

Linked Data is an Ecosystem of Portal Abstractions

https://doi.org/10.59350/vhchs-c5m72

Pubblicato 13 maggio 2020

Autore Donny Winston

In an episode of the CoRecursive podcast ¹ , Sam Ritchie uses the phrase “portal abstraction” to describe how the use of a particular term can open a portal – a gateway – to a world of relevant prior art. He discusses issues in analytics. One issue is distributing summative calculations over data both as batches and in real-time, specialized for “big” and “fast” data, respectively.

Scienze naturaliInglese

Reuse of Conceptual Models

https://doi.org/10.59350/3a3dq-tx124

Pubblicato 8 maggio 2020

Autore Donny Winston

As part of her introduction to ontology enginering, ¹ Prof. Maria Keet has a slide depicting ontology as a layer apart from conceptual data models: Conceptual data models vs. ontologies. [source] I like this visualization of various project-specific conceptual models and their associated implementations in databases and codebases.

Scienze naturaliInglese

Data-Driven Decoding

https://doi.org/10.59350/yzcy5-cg840

Pubblicato 7 maggio 2020

Autore Donny Winston

There’s a Python library called monty that supports a convention for moving between JSON objects and Python class instances. The major components are a mix-in class, MSONable, along with subclasses of json.encoder.JSONEncoder and json.decoder.JSONDecoder. An appropriate JSON object will have two special keys: @module and @class.

Scienze naturaliInglese

Just the Tip of Your Data Iceberg

https://doi.org/10.59350/wedwj-h8x65

Pubblicato 5 maggio 2020

Autore Donny Winston

I’ve heard the phrase “just the tip of the iceberg” used as a positive phrase when revealing value of which an audience might not have been previously aware. In the context of disseminating scientific data, this tip might be a publication. A reader sees paragraphs and figures that describe and show data. A Supplemental Information section might link to a much greater volume of data – the rest of the iceberg.

Scienze naturaliInglese

An ATM for Your Data

https://doi.org/10.59350/zbctn-s8x36

Pubblicato 24 aprile 2020

Autore Donny Winston

An effective data unification system is like having an ATM network for your data. Instead of having to get to a branch (and only during open hours), wait in line, and talk to a teller, even for simple withdrawals and balance inquiries, with an ATM you can do many common operations on-demand and quickly.

Scienze naturaliInglese

The WYSIWYG Publication

https://doi.org/10.59350/bcwgh-4c114

Pubblicato 24 aprile 2020

Autore Donny Winston

I was fascinated as I skimmed over a publication detailing a network analysis of crystalline materials. ¹ It’s “Open Access”, meaning I don’t have to pay to read the text and figures. There was one figure that had neat data-driven graphics for two separate materials, one stacked on top of the other. I wondered what the graphics would look like for some of the other 22,600 materials considered by the study.