Information Theory with TensorFlow

unlinked_dll · on Dec 15, 2019

I feel there is some great unification of Information Theory, classical signal processing, and controls with machine learning. Fundamentally they are different angles of applying the concepts of dynamical systems to design widgets that do something useful. Things like this quote from the top:

> Information theory is a branch of applied mathematics that revolves around quantifying how much information is present in a signal.

greatly underappreciate the value of the theory as it applies to the theory of things like ML, and one day I think we'll have a unifying theory that enables engineers to design systems for ML with little ad-hoc methods and more theoretical bases as we design filters and controls today. The issue is the nonlinearity of the networks, but I think we'll find a way there through category theory and topologies.

Footpost · on Dec 15, 2019

The main conceptual novelty that modern machine learning brings is the addition of computational complexity in the mix: from information theory's question

   what is learnable?

to

   what is learnable in
   poly-time?

(or similar resource constraints). This was pioneered, as far as I am aware, in Valiant's A Theory of the Learnable (please correct me if I'm wrong, I'm not an ML/AI historian). Interestingly, we see a similar evolution of Shannon's thinking about cryptography (what is secure information theoretically, i.e. against computationally unbounded adversaries?) to: what is safe against a poly-time restricted adversary?

ajtulloch · on Dec 15, 2019

This is very close to the thesis of Mackay’s Information Theory, Learning, and Inference textbook: https://www.inference.org.uk/itprnn/book.pdf

ignoramous · on Dec 15, 2019

Video lectures: http://videolectures.net/david_mackay/

Also see: https://news.ycombinator.com/item?id=11500221

Footpost · on Dec 15, 2019

The great G.-C. Rota quipped:

   Probability theory = combinatorics divided by n

In this vein let me add:

   Information theory = log(probability theory)

yters · on Dec 15, 2019

The significant thing is not that we're arbitrarily applying the log function to probabilities. It is that there's a relationship between expected code length and the entropy of a source. It is surprising, to me at least, that the length of a thing is related to its probability. From that relationship come a number of fascinating connections to many other fields. Check out Cover and Thomas' "Elements of Information Theory" for a very approachable introduction to all these connections.

jbay808 · on Dec 15, 2019

Isn't that exactly what Occam's Razor teaches us?

"Entities should not be multiplied without necessity."

yters · on Dec 15, 2019

Yes info theory is a formalization of Occam's razor

xvilka · on Dec 15, 2019

Information Theory with Coq[1]. Formal approach.

[1] https://github.com/affeldt-aist/infotheo