I did some quick research a while ago and some of the online hash calculators that come up in google are affiliated with one of the online hash crackers. That was the main reason I made my own hash calculator page -- so I could create hashes easily without worrying about the preimages being remembered. So you are right to be concerned.
I tried to think of a way you could test it for yourself, but for everything I thought of, I also thought of a way I could easily pass the test while still adding the preimages to the database. So for now you'll have to trust that I'm not, or use a different hash calculator.
Here's the source code for that checksums page if you want to run it on your own:
The download using Mega looks interesting. Seems that it is using the HTML5 FileSystem API to first download the file to a temporary location, without even showing a browser download dialog to the user.
In case anyone is interested, the reason it does that is that the file is encrypted on the server and decrypted clientside in javascript. The decryption key is in the link's hashtag.
I can say when I saw that I candled out and just started torrenting it. It is a large file and I rather know where it was going and not on my ssd system drive.
I am downloading the file now. I was a little surprised to see that you went with gzip. Given the target audience it is not that hard to imagine that the end user will have access to bzip2 or xz. With the size of the file why choose gzip? Hopefully `gzip -9` was used. I am curious about how much smaller xz/bzip2 will be. I will update this post once the file is downloaded.
I actually didn't compare gzip and bzip2 before uploading, and I probably should have. I did use `gzip -9`. I'm compressing it with bzip2 -9 now to see if it gets any smaller. I'll post a reply here when I know.
Use xz. Debian, Gnome, Archlinux, Gentoo, Fedora have all switched to xz, and for good reason. I expect xz will offer a 20-30% reduction in file size. That will add up quick.
My download has slowed to a crawl around 90% completion. I am doing 3k a second. When I read that your list was not sorted case sensitively I wondered what difference that may make.
I did a little experiment with the american-wordlist-insane wordlist and sorting in the interim. I used msort which does a case sensitive sort[1] and sort which does not. Here are the results:
What I've found common among publicly available dictionaries is the lack of space characters and I've seen a couple of write-ups where people actively strip space characters when creating dictionaries.
Because of this most all my passwords contain a space character, and so far it's yet to cause me any problems.
I have a decent server doing pretty much nothing, sitting on a link with a couple hundred Mbps of available bandwidth. If I can find a command-line BitTorrent client and figure it out, I'll help seed as well.
https://defuse.ca/checksums.htm#checksums
I know the site says it doesnt record the information, but is there any more assurance beyond that?