Hacker News new | past | comments | ask | show | jobs | submit login

Entropy. Sometimes you read that it's a measure of randomness; sometimes, information. Aren't randomness and information opposites?



In my opinion, saying entropy is a measure of randomness is confusing at best and wrong at worst.

Entropy is a the amount of information it takes to describe a system. That is, how many bits does it take to "encode" all possible states of the system.

For example, say I had to communicate the result of 100 (fair) coin flips to you. This requires 100 bits of information as each of the 100 bit vectors is equally likely.

If I were to complicate things by adding in a coin that was unfair, I would need less than 100 bits as the unfair coin would not be equally distributed. In the extreme case where 1 of the 100 coins is completely unfair and always turns up heads, for example, then I only need to send 99 bits as we both know the result of flipping the one unfair coin.

The shorthand of calling it a "measure of randomness" probably comes from the problem setup. For the 100 coin case, we could say (in my opinion, incorrectly) that flipping 100 fair coins is "more random" than flipping 99 fair coins with one bad penny that always comes up heads.

Shannon's original paper is extremely accessible and I encourage everyone to read it [1]. If you'll permit self-promotion, I made a condensed blog post about the derivations that you can also read, though it's really Shannon's paper without most of the text [2].

[1] http://people.math.harvard.edu/~ctm/home/text/others/shannon...

[2] https://mechaelephant.com/dev/Shannon-Entropy/


Information is actually about _reduction_ in entropy. Roughly speaking, entropy measures the amount of uncertainty about some event you might try to predict. Now, if you observe some new fact that has high (mutual) information with the event, it means the new fact has significantly reduced your uncertainty about the outcome of the event. In this sense, entropy measures the maximum amount of information you could possibly learn about the outcome to some uncertain event. An interesting corollary here is that the entropy of an event also puts an upper bound on the amount of information it can convey about _any_ other event.

I think one frequent source of confusion is the difference between "randomness" and "uncertainty" in colloquial versus formal usage. Entropy and randomness in the formal sense don't have a strong connotation that the uncertainty is intrinsic and irreducible. In the colloquial sense, I feel like there's often an implication that the uncertainty can't be avoided.


I would recommend the substitution s/randomness/uncertainty/ since it seems to be the more useful concept. With the equivalence between the two ways of thinking becomes more clear. The uncertainty you have before learning the value of a bit, is equal to the information you gain when learning it's value.

Let's use an analogy of a remote observation post and with a soldier sending hourly reports:

    0 ≝ we're not being attacked
    1 ≝ we're being attacked!
Instead of thinking of a particular message x, you have to think of the distribution of messages this soldier sends, which we can model as a random variable X. For example in peaceful times, the message will be 0 99.99% of the time, while in war times could be 50-50 in case of active conflict.

The entropy, denoted H(X), measures how uncertain the central command post have about the message before they will receive, or equivalently, the information they gain after receiving the message. The peace time messages contain virtually no information (very low entropy), while wartime 50-50-probability messages contain H(X)=1 bit each.

Another useful way to think about information is to say "how easy would be to guess the message" instead of receiving it? In peacetime you could just assume the message is 0 and you'll be right 99.99% of the time. In wartime, it would be much harder to guess---hence the intuitive notion that wartime messages contain information.


Another useful topic to understand that's related to this: Hamiltonian Mechanics (https://en.wikipedia.org/wiki/Hamiltonian_mechanics), which is a whole other way to express what Newton described with his physics, but in pure energetic terms.

Entropy is usually poorly taught, there's really three entropies that get convoluted. The statistical mechanics entropy, which is the math to describe random distribution of ideal particles. There's Shannon's entropy, which is for describing randomness in strings of characters. And there's classical entropy, which is to describe the fraction of irreversible/unrecoverable losses to heat as a system transfers potential energy to other kinds of energy on its way towards equilibrium with its surroundings or the "dead state" (which is a reference state of absolute entropy).

These are all named with the same word, and while they have some relation with each other, they are each different enough that there should be unique names for all three, IMO.


Entropy is measured in bits (they exact same bits we talk about in computing), this can help unify the ideas of information and randomness:

Which file can contain more information: a 1.44 MB floppy disk or 1 TB hard disk?

Which password is more random (i.e. harder to guess): one that can be stored only 1 byte of memory or one that is stored in 64 bytes?

Information theory deals with determining exactly how many bits of it would take to encode a given problem.


Let me give an analogy and then a solution to your paradox.

Temperature. Sometimes you read that it's a measure of warmth; sometimes cold. Aren't hot and cold opposites?

Yes, hot and cold are opposites, but in a way they give the same kind of information. That's also true for information and randomness. Specifically, little randomness means more (certain) information.


The more randomness, the more information bits you need to encode the observed outcome. But I can see where your dissonance comes from: you probably parsed "information" as "information I already have about the system", not "information I need to describe the system state".


Judging by the answers to your question(most of them wrong or with serious misunderstandings) it seems you hit the nail straight in the head.


The way I think of entropy like the capacity of the channel. The channel can be filled with less information than the capacity, but not more.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: