Hacker News new | past | comments | ask | show | jobs | submit login

It does not allow infinite string compression, because you obviously have to store the original data. For hashes of good length and quality, the theory stands. I would even use MD5 and say the theory stands for non-critical every day uses given there is no attacker.



> It does not allow infinite string compression, because you obviously have to store the original data.

Nope. Say you have a hash function `h` which guarantees a unique, 128-bit output for any input. Then `h` is a function which compress any string into 128-bits:

If `y = h(x)` then it is trivial for me to write a program that will reconstruct `x` from `y`. I will simply iterate `x` through the possible input strings (which I can do because the set of strings is countable) until I find one that satisfies `h(x) == y`. Impractical, yes, but allowed by the theory, and that means the theory is invalid.


By your very own definition, "In theory" means "we have a model that makes good predictions in some circumstances."

If I let "some circumstances" be "non critical use with no attacker" and the model be "very very very very very very very very very very very very very low probability of collision when applied on pre-existing files", I don't see any problem.

The problem is that you were commenting the article using an absurdly geek point of view, interpreting each sentence as it had a very strict mathematical meaning.

In the real world, peoples sometime do misuse the language a little with the resulting sentences being easier to understand for everybody. That is generally not a call for a smart people to point out that when translated word by word (i.e. when poorly translated) using quantifiers, the statement is mathematically false. Especially when everybody on earth understand that use of "in theory" in the first place.

If Jeff wanted to make a mathematical article, he probably had written mathematical statements in the first place.

If every time you see the world "theory" you are feeling you should correct the one using it by telling him about corner cases he very probably already knows about their existence in the first place, you will loose a lot of time for a lot of non-constructive comments.


That's false. Nobody is ever saying that h(x) is unique, only probability that h(x) = y for any y should be about 1/size of y choice space.


"A given hash uniquely represents a file, or any arbitrary collection of data. At least in theory."


If the theory doesn't stand in all cases, then it doesn't stand. It should be "in practice".




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: