MatematicaIngleseHugo

Math ∩ Programming

Recent content on Math ∩ Programming
Pagina inizialeRSS ForaggioMastodon
language
MatematicaInglese
Pubblicato
Autore Jeremy Kun

There’s a well-understood phenomenon in machine learning called overfitting. The idea is best shown by a graph: overfitting Let me explain. The vertical axis represents the error of a hypothesis. The horizontal axis represents the complexity of the hypothesis. The blue curve represents the error of a machine learning algorithm’s output on its training data, and the red curve represents the generalization of that hypothesis to the real world.

MatematicaInglese
Pubblicato
Autore Jeremy Kun

It’s about time we got back to computational topology. Previously in this series we endured a lightning tour of the fundamental group and homology, then we saw how to compute the homology of a simplicial complex using linear algebra. What we really want to do is talk about the inherent shape of data.

MatematicaInglese
Pubblicato
Autore Jeremy Kun

In 2014 the White House commissioned a 90-day study that culminated in a report (pdf) on the state of “big data” and related technologies. The authors give many recommendations, including this central warning. Warning: algorithms can facilitate illegal discrimination! Here’s a not-so-imaginary example of the problem. A bank wants people to take loans with high interest rates, and it also serves ads for these loans.

MatematicaInglese
Pubblicato
Autore Jeremy Kun

A while back we featured a post about why learning mathematics can be hard for programmers, and I claimed a major issue was not understanding the basic methods of proof (the lingua franca between intuition and rigorous mathematics). I boiled these down to the “basic four,” direct implication, contrapositive, contradiction, and induction. But in mathematics there is an ever growing supply of proof methods.

MatematicaInglese
Pubblicato
Autore Jeremy Kun

When addressing the question of what it means for an algorithm to learn, one can imagine many different models, and there are quite a few. This invariably raises the question of which models are “the same” and which are “different,” along with a precise description of how we’re comparing models.

MatematicaInglese
Pubblicato
Autore Jeremy Kun

A while back Peter Norvig posted a wonderful pair of articles about regex golf. The idea behind regex golf is to come up with the shortest possible regular expression that matches one given list of strings, but not the other. “Regex Golf,” by Randall Munroe. In the first article, Norvig runs a basic algorithm to recreate and improve the results from the comic, and in the second he beefs it up with some improved search heuristics.

MatematicaInglese
Pubblicato
Autore Jeremy Kun

I have a little secret: I don’t like the terminology, notation, and style of writing in statistics. I find it unnecessarily complicated. This shows up when trying to read about Markov Chain Monte Carlo methods. Take, for example, the abstract to the Markov Chain Monte Carlo article in the Encyclopedia of Biostatistics. Markov chain Monte Carlo (MCMC) is a technique for estimating by simulation the expectation of a statistic in a complex model.

MatematicaInglese
Pubblicato
Autore Jeremy Kun

Last time we defined the Hamming code. We also saw that it meets the Hamming bound, which is a measure of how densely a code can be packed inside an ambient space and still maintain a given distance. This time we’ll define the Reed-Solomon code which optimizes a different bound called the Singleton bound, and then generalize them to a larger class of codes called Reed-Muller codes.

MatematicaInglese
Pubblicato
Autore Jeremy Kun

Problem: Given a massive data stream of $ n$ values in $ \{ 1, 2, \dots, m \}$ and the guarantee that one value occurs more than $ n/2$ times in the stream, determine exactly which value does so. Solution: (in Python) def majority(stream): held = next(stream) counter = 1 for item in stream: if item == held: counter += 1 elif counter == 0: held = item counter = 1 else: counter -= 1 return held Discussion: Let’s prove correctness.

MatematicaInglese
Pubblicato
Autore Jeremy Kun

Or how to detect and correct errors Last time we made a quick tour through the main theorems of Claude Shannon, which essentially solved the following two problems about communicating over a digital channel. What is the best encoding for information when you are guaranteed that your communication channel is error free? Are there any encoding schemes that can recover from random noise introduced during transmission?