Hacker News new | past | comments | ask | show | jobs | submit login
Information Theory with TensorFlow (adhiraiyan.org)
108 points by behnamoh on Dec 15, 2019 | hide | past | favorite | 9 comments



I feel there is some great unification of Information Theory, classical signal processing, and controls with machine learning. Fundamentally they are different angles of applying the concepts of dynamical systems to design widgets that do something useful. Things like this quote from the top:

> Information theory is a branch of applied mathematics that revolves around quantifying how much information is present in a signal.

greatly underappreciate the value of the theory as it applies to the theory of things like ML, and one day I think we'll have a unifying theory that enables engineers to design systems for ML with little ad-hoc methods and more theoretical bases as we design filters and controls today. The issue is the nonlinearity of the networks, but I think we'll find a way there through category theory and topologies.


The main conceptual novelty that modern machine learning brings is the addition of computational complexity in the mix: from information theory's question

   what is learnable?
to

   what is learnable in
   poly-time?
(or similar resource constraints). This was pioneered, as far as I am aware, in Valiant's A Theory of the Learnable (please correct me if I'm wrong, I'm not an ML/AI historian). Interestingly, we see a similar evolution of Shannon's thinking about cryptography (what is secure information theoretically, i.e. against computationally unbounded adversaries?) to: what is safe against a poly-time restricted adversary?


This is very close to the thesis of Mackay’s Information Theory, Learning, and Inference textbook: https://www.inference.org.uk/itprnn/book.pdf



The great G.-C. Rota quipped:

   Probability theory = combinatorics divided by n 
In this vein let me add:

   Information theory = log(probability theory)


The significant thing is not that we're arbitrarily applying the log function to probabilities. It is that there's a relationship between expected code length and the entropy of a source. It is surprising, to me at least, that the length of a thing is related to its probability. From that relationship come a number of fascinating connections to many other fields. Check out Cover and Thomas' "Elements of Information Theory" for a very approachable introduction to all these connections.


Isn't that exactly what Occam's Razor teaches us?

"Entities should not be multiplied without necessity."


Yes info theory is a formalization of Occam's razor


Information Theory with Coq[1]. Formal approach.

[1] https://github.com/affeldt-aist/infotheo




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: