Hacker News new | past | comments | ask | show | jobs | submit login

They could store a hash.



Which would never work because real life data is messy so the hashes would not match. Even something as simple as SSN + DOB runs into loads of potential formatting and data entry issues you'll have to perfectly solve before such a system could work, and even that makes assumptions as to what data will be available from each dataset. Some may be only name and address. Some may include DoB, but the person might have lied about their DoB when filling out the form. The people entering it might have misspelled their name. It might be a person who put in a fake SSN because they're an illegal immigrant without a real one. Data correlation in the real world is a nightmare.

When you tell a data broker to delete all of the data about you, how can you be sure they get ALL of the data about you, including the ones where your name is misspelled or the DoB is wrong or it lists and old address or something? Even worse if someone comes around later and discovers the orphan data when adding new data about you and fixes the glitch, effectively undoing the data delete.

It's a catch-22 that if you want them to not collect data about you they need a full profile on you in order to be able to reject new data. A profile that they will need to keep up-to-date, which is what they were doing already.


> Even something as simple as SSN + DOB runs into loads of potential formatting and data entry issues you'll have to perfectly solve

You don’t have to solve it perfectly to be an improvement.

Also this is BS. Not every bit of data is perfectly formatted and structured but both of your examples are structured data. You can 100% reliably and deterministically hash this data.

There’s so much in your argument that can be replied with “imperfect is better than status quo”. If you give someone the wrong DOB, it’s “not you” anyways, at least let me scrub my real data even if the entry is imperfect for some people or some records.


> You don’t have to solve it perfectly to be an improvement.

https://en.wikipedia.org/wiki/Nirvana_fallacy


> You don’t have to solve it perfectly to be an improvement.

They don't want to solve your problem. You aren't their customer. They want to comply with the letter of the request in as much as it covers their own butt in terms of regulatory requirements and/or political optics.


The “solution” mentioned is political. A requirement that data on an individual is properly deleted when presented with the data would be “good”. A requirement that captures every nuance of mistakes would be “perfect”.

Hashing a birthday and SSN is deterministic. We could deterministically keep that data deleted. This would be better than we have today, and could be done reliably and affordably.

The companies can easily be required (by law) to implement the “good” solution. Everyone complaining it’s not “perfect” is stopping “good”.


People switch digits in their SSN.


Then it’s different… Better vs perfect.

If you can’t get your SSN right, you can’t expect a company to delete it.


There's a trivial way to not re-add data that was removed: don't do it without user opt-in, whom admittedly you have access to ask at the moment of data collection. If you don't have the ability to ask users to opt in, you probably shouldn't be collecting the data anyway, with very few exceptions like criminal records.

edit for clarity: by criminal records, I mean for the official management of them, not for scraping their content.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: