> Someone can just hash every number from 000-00-0000 to 999-99-9999 and figure out mine from that.
That's what salts are for, right? It wouldn't be too hard to issue a very large, known, public salt alongside each SSN.
> And of course none of the data brokers have much reason to make opt-outs work well, in the absence of legislation and strict enforcement - it's in their commercial interests to say they "can't stop your data reappearing"
If the salt is public, what’s the point, then you can get all the salts, and combine them with every possible ssn, and you’re back where you were before.
No, that is kind of the point of a salt is that it doesn't need to be hidden - it's designed for a scenario where e.g. your database is hacked and they're visible as plaintext: https://en.wikipedia.org/wiki/Salt_(cryptography)
Since the salts are random, unique to each SSN and long: a) you'll find no existing rainbow table that contains the correct plaintext for your SSN hash and b) each SSN now requires its own bruteforcing that is unhelpful for any of the other SSNs
Combine that with a very expensive hashing method like PBKDF2 (I'm sure there's something better by now) and you've made it pretty dang hard for non state actors to bruteforce a significant chunk of SSNs. There's also peppers that involve storing some more global secrets on HSMs.
I'm sure the crypto nerds have like a dozen better methods than what I can come up with but the point is this is not a feasibility issue.
I’m sorry but it’s not that simple. You can’t just say add salt, here are the benefits of salt, problem solved.
In a password database, salt is not secret because the password combined with it is secret and can be anything. Even if you know the salt for a particular user, in order to crack that user, you need to start hashing all possible passwords combined with that salt. If a user picks a dumb password like password123, then they are not safe if the salt leaks. Other users with password=password123 will not be immediately apparent because other users have different salts. You would have to try password123 combined with each user’s salt to identify all the users with password123.
You said “It wouldn't be too hard to issue a very large, known, public salt alongside each SSN.” That means there should be some theoretical service where you pass it an ssn and get back the salt, right? So what have you gained? Any attacker with an ssn can get the salt, and nothing was gained. Or if attackers don’t have ssns they can just ask for all the salts, the mapping from ssn to salt is public so they know 000-00-000 has salt1, 000-00-001 has salt2, etc, so you haven’t increased the amount of hashes attackers have to do to do whatever it is they want to do.
You’re right about commercial interests being at play. That’s why we don’t have laws like GDPR in the USA. Crypto nerds have thought about this long and hard and if it was that easy we wouldn’t need stupidly complex laws like GDPR. They would “just add salt.” Or other services would “just add salt” instead of relying on more complex and expensive forms of identity verification and protection.
You don’t need to be a crypto nerd to try to describe a flow where having a public known salt per ssn helps with privacy. You do not need to be a crypto nerd to design secure one way hash functions that would plug into that flow.
Yep, you are right, complete brain fart on my end. Of course it doesn't work if it's required for the salt to be publicly mappable to the SSN, since that just circumvents the whole thing. I just didn't understand what you were saying in your earlier message.
"all the salts" * "all the SSNs" becomes a very big number. With a large enough but still reasonably sized salt, you can engineer it so that hashing all combinations takes an amount of time greater than the age of the universe even if you use all the computers in the world.
All the salts * all the ssns is a very large set but it’s irrelevant because in the above scenario each ssn has a public well known salt, you don’t have to test each salt against each possible ssn because the mapping from one to the other is known.
Even if such a service doesn’t exist, and you just have a list of all the salts without knowing which ssn they map to, you’re just hand waving how hard it will be to hash the entire salt*ssn set.
Hashing a salt+ssn can’t take too too long because data brokers need to be doing it frequently in order to verify identities.
In this report, https://files.consumerfinance.gov/f/documents/cfpb_consumer-..., it says monthly volume of credit card marketing mail is in the hundreds of millions per month. Can we assume that each piece of mail is roughly associated with one instance of hashing a salt+ssn? Given that number, how expensive (in terms of time, compute cycles, whatever) can it possibly be to hash a salt+ssn? If we make it too expensive, expensive enough to support your “age of the universe” claims, credit markets would grind to a halt.
I’m quite familiar with how a salt works. One might say deeply familiar since I have worked on auth services for very large, very secure organizations.
Poster above me just said “add salt” and waved their hands without describing anything concrete, like just saying some magic words can solve hard problems.
That's what salts are for, right? It wouldn't be too hard to issue a very large, known, public salt alongside each SSN.
> And of course none of the data brokers have much reason to make opt-outs work well, in the absence of legislation and strict enforcement - it's in their commercial interests to say they "can't stop your data reappearing"
This is the actual reason, IMHO.