Natural SciencesHugo

Donny Winston

Donny Winston
Made as simple as possible, but not simpler.
Home PageAtom FeedMastodon
language
Natural Sciences
Published

Some queries can be optimized; others can’t. This is due to the structure of the query itself, which in turn is due to the structure of the underlying data. “Sargable” is database-speak for Search ARGument ABLE . 1 A query is sargable if it can be optimized, e.g. by creating and using an index. It’s a good idea to write sargable queries.

Natural Sciences
Published

Open Science and Open Data need not be binary efforts. You can curate and nurture a “core” comprised of fully-open, public data. Surrounding that, you can have a layer open to your organization, but not the public. And surrounding that, you can have a layer open to your team, but not the broader organization.

Natural Sciences
Published

For the Jewish holiday of Passover, guidance is provided as to how to explain its interface when onboarding various identified end users. In particular, four types of children are identified. The so-called “wise” child asks about features and how to accomplish tasks in the ideal manner.

Natural Sciences
Published

Data-driven programming means that you change the logic of a program by changing data rather than code. A good example is statistical spam filtering. 1 The conventional approach is to continually update and maintain code that describes relevant patterns to match and that implements conditional logic based on elaborate pattern-matching.

Natural Sciences
Published

Word games are a technique for troubleshooting problem statements. They are usually cheaper than unwanted solutions. They expose ways that well-intentioned problem solvers trip over a misunderstood word, a misplaced comma, or ambiguous syntax. 1 In the dictionary game, you make a list of a dictionary’s meanings for each word in the original sentence. Then, you try to apply each of those meanings in turn.

Natural Sciences
Published

You’ve collected a sample of important questions that you want a data system to answer. 1 How do you ensure the system will be able to answer those particular questions? You might have the questions “in mind” as you design and develop a data model, process recorded observations to fit that model, write queries against that model, and process query results into readable answers.

Natural Sciences
Published

Homology deals with substructure similarity. For example, the structures of concern may be gene sequences – structures with clear reification as physical arrangements of atoms. An example technique for evaluating such structural homology among genes is k-mer search.

Natural Sciences
Published

A scientific database cannot be everything to everyone. Jim Gray came up with the “20 queries” heuristic. What are the 20 most important questions the researchers want the data system to answer? 1 Five questions are not enough to see a broader pattern, and 100 questions would dilute focus. Also, the relative information in queries ranked by importance is likely to be logarithmic – a “long tail” distribution.