Hacker News new | past | comments | ask | show | jobs | submit login
Have I Been Facebooked? (haveibeenfacebooked.com)
384 points by mendelmaleh on April 5, 2021 | hide | past | favorite | 219 comments



"Currently, we don't know if Facebook has fixed the vulnerability since the company hasn't released any statement regarding the breach."

"This is old data that was previously reported on in 2019. We found and fixed this issue in August 2019" - FB


The data might be from an old breach, but the data are also unlikely to change very often, so is more than likely current for a large proportion of those who have been exposed (phone number, date of birth, full name, etc).


That is a dishonest statement, as a lot of data is recent, and they know that and are almost pretending like it is some kind of feature.


It seems like the creators are heading towards a libel lawsuit if they keep that up on their site.


I don't understand libel laws very well. If it's based on that quote, could you explain a bit why that would be?


I deleted my (outdated) phone number from facebook years ago and it's still part of the leak, with my name and gender in it. I did not replace the phone number with another phone number. Really says something about what delete means for fb.


Yep, my guess is they just pop in a flag that says "not current" or "former" or something. Think about what someone could do with this data though. They can unfreeze your credit report. Apply for a loan or credit card or mortgage. All they need is your name, DOB, SSN (or just last 4), and the last 3 addresses you resided at.


How many years ago did you delete your phone number? This is not a recent dump of data, it is just recently been made more widely available. I believe the original leak was some time in 2019.


I moved out of the country mid 2018, and that was around the time I removed the number from every service I had it with. I tried to migrate to 2FA and authy since my phone number was likely going to change frequently. I also tried to switch over to twilio for services that required a number, but a lot of short codes reject it, which really sucks for someone who needs to change phone numbers a lot because of travels.


that's illegal by European standards i think... i wonder if somebody has the money and the time to bring this to court. It would be necessary.


Apparently the Irish Data Protection Commission is going to bust Facebook.

Facebook leak: Irish regulator probes 'old' data dump: https://www.bbc.com/news/technology-56639081

I hope https://NOYB.eu is aware of this breach.

Of course Facebook, or even FAANG, for that matter, would keep all of the data that they hoarded from EU citizens, illegally. It goes along well with the Silicon Valley mentality (I am culturally American and I am from the west coast of the US, so I understand what is going on here. I also hold EU citizenship...)

The number one rule is "don't get caught". The "move fast and break things" mantra still holds well for Facebook. So, no surprise that they were sloppy with things, and got caught.

If you want to check if you were caught up in being Facebooked (data leaked), the download to the data dumps are here: https://archive.is/MZqak

I am furious at the moment because I found my (fraternal) twin brother's info in the data dump files. :-(


> Apparently the Irish Data Protection Commission is going to bust Facebook.

I doubt it. They spent 7 years on a case against Facebook doing the absolute minimum necessary. NOYB are suing the Irish DPC in an attempt to get them to do their jobs. It’s a mess. Irish DPC apparently only investigated 83 GDPR cases and over 4000 “concluded without inquiry”. They only made 11 data protection decisions last year, compared to 600 by Spain whose data protection branch has a similar budget.

https://noyb.eu/en/dpc-cancels-parliamentary-hearing-eu-us-t...

https://noyb.eu/sites/default/files/2021-03/Letter%20Max%20S...

https://noyb.eu/en/facebooks-gdpr-bypass-reaches-austrian-su...


True. NOYB is basically the one doing all of the dirty work.

I donate to NOYB because they are actually the ones making sure this stuff is enforced.

There are great articles on the Financial Times about GDPR, tech regulation, emerging technologies, etc., that are on the spot. I remember one stating that all of the regulatory agencies for data protection in all of the respective EU countries were understaffed. It gave a great visual.


The EU does, they have often started anti-trust lawsuits so I don't see why they wouldn't start a GDPR / privacy violation lawsuit.


i have had an idea. The last time i deleted my facebook account, it had been a 6-7 year old legacy thing and i ended up with manually deleting stuff, photos videos, contacts, calender entries. then waiting for months. I had an idea. This was back when "shadow profiles" had appeared on the news. i figured if i outright delete the account, maybe it would keep it in a "deleted because not coming back" DB. instead if i deleted stuff, maybe the idea would be "okay.. routine stuff.. delete"

apparently both my ideas were wrong but good thing i don't use any facebook property, don't use whatsapp or isntagram and am a hermit. I had telegram since 2015 but since signal whatsapp thing happened, i stopped using it. :-/


It's not illegal to soft delete your pictures, and it may be illegal not to in isolated cases where a court preservation order is in effect. It is, however, illegal (by the GDPR) to not hard delete your account on request.


I deleted mine from Facebook years ago, and it periodically still tries to entice me to verify my number with it prefilled.

With that said my details do not appear to have leaked.


same with me. it has my number from deleted facebook account


The same here :/


> Really says something about what delete means for fb.

Since the storage of data is so cheap, any company will archive data, for future profit.

Why did you believe any data will be deleted in the first place? Were you counting that government will take action if it finds out? Are there any case like this in the past.

I find it surprising even programmers believe their data will be deleted by the company.

Most people even programmers believe a company will delete their data. Whats your background? Are you a coder?


> Why did you believe any data will be deleted in the first place?

Ethics? Morals? That is what we would expect from "delete my account".


In EU there's GDPR and right to be forgotten. If I forbid company to store my data they're obligated by law to remove it from their servers.


Not quite, only if the basis for processing is consent.


Also not quite. Chapter 3 Article 17 covers the right to be forgotten.

1(b) is 'withdrawal of consent', but note also 1(c), which refers to article 21, which allows subjects to object, and you should (as a controller) have 'compelling legitimate grounds' to continue processing (which is a higher standard than 'legitimate interests', which can be the basis for processing the data in the first place).



I get different results from this and HIBP


If you read the bottom of the post, the complete phone number database has not loaded into HIBP yet. The results will likely change once it's done.


To be clear: I was listed on HIBP but not on this tool


Same here, my phone shows in HIBP but not in this tool


This seems to be more reliable. The obfuscated name and other info matched my own number.


My number was leaked (checked the dump myself) but I don't show up on this site. Seems like there are some bugs to work out


I'm not surprised, the data dump was an ugly mess of inconsistently encoded data in inconsistent formats with "delimiters" that often appear in the data itself.

Cleaning that up is a serious effort and requires operations on huge files that are very difficult for most software to deal with.


I don't understand how a serious effort would be required, even if the chosen delimiter being present within the data is an issue, the phone number is the first field.

I can get all the phone numbers myself with a simple `cat * | cut -d ":" -f 1`


That's the ID number you just grabbed. The phone number is the second field :)

If that's literally all you want, yes, it's not that hard. But a non-trivial number of people decided to put commas or colons in their names and other nonsense like that, there are lots of commas in the hometown or location fields which makes parsing those a pain, etc.


Aha we must be looking at different data then, possibly someones already done much of the corrections on the version I'm looking at.


Possible, but there are also different files with different schemas, so it's hard to even say that.

There only ones that actually define the data are the 9 or so CSV files that have a header like:

id,phone,first_name,last_name,email,birthday,gender,locale,hometown,location,link

Those are what I looked at and those are super annoying because several have commas in both the first & last name. I don't know why, but a handful of people listed their names as some, guy, some, guy which I assume should be split into firstname: some, guy and lastname: some, guy. Then a lot of people have None for a birthday, some have something like May 8, and others have something like May 8, 1990. Both locale & hometown can be either None, or have several commas in them.

I had to reformat all that data and validate that each field made sense to parse it. There are helpful "Location" and "link" markers in the CSV but it's still super annoying to parse this stuff.


Also be careful, some of these docs have BOMs that screw up parsing tools (even iconv crapped out on one of the files, Qatar I think it was) and the encoding is all over the place. At least the phone number is ASCII, but the names may be UTF-8 (with or without BOM), UTF-16-le or...


Likewise.

That said, the actual files are in fairly shabby shape and quite tedious to clean for DB import. They may have missed thousands of records.


I haven't had the chance to check the dump but I am sure Facebook had my phone number. I'm surprised this site says my number wasn't leaked.


If "Who can look you up using the phone number you provided?" setting wasn't set to "Everyone" in your privacy settings, then your phone number wouldn't be visible to the scraping campaign that was the source of this.


Thanks for saying this. Seems like most people have a misunderstanding of where the data came from.

To clarify: in 2019 you could enter a phone number into Facebook search and it would show you whichever profile was associated with the number of it was set to public.

The “hackers” set up a script to go through every number sequentially: 15550000000, 15550000001, etc.

I would be very surprised if the original data doesn’t contains a LOT more information —- basically everything that can be found by publicly viewing your profile page.


where can i find the dump, I've wanted to check my data as well


I found it here https://archive.is/MZqak


Search fbleaks on telegram


I tried and get no results.



can confirm.


+1. Found my pal's German number in the dump but not on this website.


I am a co-author of the site. We are already aware of your concerns about giving out your phone number. The source code is free and reviewable on Github. We know it's not possible to verify what's running on a server but we hope it adds a level of trust. We are currently hashing all phone numbers so we don't have to deal with them anymore. We will keep you updated.


Hash the phone number in the browser before sending it to your server. That way it is at least possible to verify via devtools what is being send.

Heck. Allow even prehashed phone number to be entered.


I understand the frustration but they have the data and hence, a rainbow table, sending a pre-hashed phone number is the same as sending an unhashed one, unless the worry is man in the middle, who is just as likely to get the data.

The only way to check without giving up personal info is to get the data and look locally, or perhaps search for so many phone numbers that yours is buried in the haystack.


That only holds for numbers already part of the dump. If you submit an unknown phone number that is not contained in the leak, hashing it before sending it will increase data privacy.


So a rainbow table of just 2.9 billion numbers covers the USA phone set. So I think searching for a specific number clear or hashed are roughly similar exposures. Maybe the right way to search without disclosure is really to filter. Meaning instead of putting in your full 10 digits. You just put in 7/8 and it returns a list of the rest for you to see. Then you visually scan to find your number out of the returned 9999 results.

(Assuming seeing hacked numbers are public already - but I don’t love this either)

What’s a secure way of searching without disclosure by either party? (Non troll question)


I think HIBP implements it like this: you hash your email/phone number and send only a prefix of the hash to the server. The server responds with a list of hashes matching the prefix. Now you can check if your hash is in the list. If so, you have been pwned. This way the server never knows which email you are requesting since it only ever sees a part of the hash.


> What’s a secure way of searching without disclosure by either party?

Download the original data set. US records are around a GB file.


That's what we're planning to do. Thank you


> Your phone number has not been found in the leak. This is good, but you should be worried anyway.

I, too, am wary of Facebook but to other people, without reasoning provided, this sounds like FUD. Maybe at least link to an article explaining why they should be concerned anyway.


checking the code i don't see where it's being hashed in the browser before checking the backend yet


Facebook should email those affected... surely they know who was compromised or not. Shouldn't have to use random sites for this. Why has there been no communication from them?


> Why has there been no communication from them?

It's not like they care even a bit. And they wouldn't win any goodwill from it. If you have been zucked, you've been zucked, that is it.


Zuck: Yeah so if you ever need info about anyone at Harvard

Zuck: Just ask.

Zuck: I have over 4,000 emails, pictures, addresses, SNS

[Redacted Friend's Name]: What? How'd you manage that one?

Zuck: People just submitted it.

Zuck: I don't know why.

Zuck: They "trust me"

Zuck: Dumb f**s.


[flagged]


I'm not saying that employees are always exempt from responsibility, but your brute-force approach is tyrannical, scary... Just imagine what all other employees would think if such an action would be possible, everyone would be afraid to do his job, everything would collapse. These are complex problems, there can't be an easy guillotine-like solution.


"public statement by an ex-employee rebuking these practices"

This idea sounds uncannily familiar for anyone from the former Soviet bloc.


[flagged]


[flagged]


Depends on when they joined Facebook. Early employees had no idea what it would become.


Totally. I tried to phrase it as a "cool off" period since they were last employed by the company. If someone hasn't worked there in five years, they're not involved in the present day debacle and can't vote against company practices.


I agree they don't care about their users, but the reason there isn't a communication is more likely that they have to filter all of this through their legal team first.


the GDPR would reuire that from them but hen again..


The leak doesn’t provide much. The backlash of sending an email will instantly be way worse than what was actually leaked. Far worse leaks happen routinely from big names. However Facebook’s negative reputation would sway so far against it, you’d think Facebook had doxxed every one.

I don’t particularly like Facebook or any big corporation FYI.


The backlash of GDPR for not informing users may be far more severe.


Sure, but if the cost of paying the fines for that is less than the expected loss of revenue from demonstrating, directly, that you can't keep user data safe (to a global audience, not just Europe), it's rational for Facebook to take the fines (of course, arguing via their lawyers as long as possible that they shouldn't be required to pay them - lawyer bills are harder to spin as an admission of guilt).

I don't like it, but it makes some level of sense from their perspective. We tend, globally, to prefer "gentle swat on the back of the hand" level penalties for companies that have behaved terribly.


Sure, but if the cost of paying the fines for that is less than the expected loss of revenue

GDPR allows for fines of up to 4% of global revenue. Unfortunately the agencies that are supposed to enforce it are too feeble to do so.


I'm pretty sure the GDPR requires notifying affected users in addition to notifying data protection agencies. I haven't seen any public statement about the breach from Facebook, yet alone received any messages from them.

Interestingly my phone number seems to have been exposed but not my e-mail address. I don't recall providing my phone number to Facebook although I do recall explicitly being asked to. I do however use WhatsApp. I wonder if there's more to this.


Can someone bcrypt all these phone numbers & emails and make that public? Share the salt and then everyone can just test their own phone number without sending it to some rando


If you bcrypt it then the site could just keep the mapping of email to hash value. Then when you do a lookup, the site would know what email you tried.

https://en.m.wikipedia.org/wiki/K-anonymity is likely a better approach to prevent that, which other similar sites like https://haveibeenpwned.com/ are already doing for the email addresses.


i mean bcrypt so ppl can download the dataset and test locally


Yeah. I'd like to self-host a wrapper around the dataset so that people who trust me (friends and family) can safely check if their number's in the breach.


What's the difference in salting if the salt is shared?

Edit: Replying again- sorry. Thinking about this some more-- if the person were to concatenate the number with first last or something they could distribute the list and the person's name would be the salt. So 9195551212JohnDoe becomes $hash and user just has to know all the pieces to test locally.


Habit. Prob right salt wouldn't be needed.


Well, maybe you were on to something-- your thoughts inspired me to think about this a little more- we just need a unique salt only the user would know.


still, bcrypt's design means it'll take way too many cycles to brute-force-reverse the dataset, so it should be safe to share.

One downside of a record-specific salt is nicknames e.g. john vs johnathan , or misspellings. (false negatives)


Just off the top of my head: you can hash the text with hash-1, and send a query containing a hash id bucket computed with h1%(N/1000), get 1000 responses from the server hashed with h2 function. Then we can search for our h2 inside the 1000 results without the server knowing which one we were looking at. We also can't decode the 1000 responses we got.



I downloaded the files for my country the other day. There are some links on an HN page that might point to pages with the files:

https://news.ycombinator.com/item?id=26690044


You can just iterate thru all possible phone numbers. It’s not that large, which is why anyone saying they are hashing your contacts to keep them private isn’t doing a lot of good.


Still maybe from a legal standpoint and depending on countries it'll be better to download and locally test against a hashed set than to download the dump.

An adequately hashed set can be proved to be harmless because you can't revert to real data (as far as you know when downloading). Downloading the original set could be considered malicious by some governments (but idk/ not sure ianal).


bcrypting 533M phone numbers alone on my PC would take about (edit: 185) days straight by my calculations


How does one know a site like this is not just an other data harvesting site?


I put up another site for that: https://haveibeenpwnedbyhaveibeenfacebookeddotcom.com

Just enter your email and the site will tell you whether your email has been harvested by https://haveibeenfacebooked.com/




That's an easy one: yes :)


There is sourcecode for both front and backend and the creators linked their names. So check the sourcecode and decide whether you trust the people involved that this is actually the code that is running the site


How do you know that the source code they published is the same code that the site uses?


You don't. Their comment literally says that. You can choose whether or not you trust the authors.


https://www.troyhunt.com/the-ethics-of-running-a-data-breach...

Other than Troy Hunt being well known and building a reputation on being a white hat security guy.

In the world of data the only thing you can trust in absolute terms is encryption. Anything else involving people involves shades of grey.


Even if they were running it with nefarious purposes, it's barely even data. All they could realistically "harvest" would be knowing that the user behind a certain IP address was curious as to whether this or that number was a telephone number in the leak, and there's nothing to guarantee that the number belongs or ever belonged to the person looking it up, or even whether it is an actual telephone number or not.

I could see the "big-data" value of that information if they managed to get a significant proportion of the population to check this website, but even that would be barely worth the effort.


I feel like I was doing the same mental gymnastics when I gave fb my phone number in order to log into messenger on my phone, and look where it got us.


what mental gymnastics? Giving your phone number to facebook associates it with your account, who you know, your browsing habits etc. This does nothing of the sort.


How did Facebook know that it was a valid phone number, how did they associate it with your identity, and how did they know that number was really yours?


It’s probably been a decade since I gave them my phone number, so I don’t remember the details. I believe they required it for the messenger app and did a 1 time SMS code for first time log in to messenger.


When they identified that my data was part of the link, they included an obfuscated first and last name - first letter plus length of each. If they were harvesting data, they certainly appear to already possess it.


They are in the EU, which means that you are protected by the GDPR. And the websites clearly states that no data is being harvested. I believe that two italian teenagers would not want to get sued to bankruptcy :)


Not sure what’s going on but it says my number is not part of the leak, but I’ve checked myself and it is actually leaked. Just be aware that it may not be complete.


Same, for a contact's number (I'm checking on their behalf). It shows up in the raw dump [1] but not this site.

[1] https://archive.is/MZqak


Interesting that Australia seems to be completely missing


Yeah, it's almost like it got merged with Austria, given it says "Austriaia"


Australia anschlussed Austria last week as part of some covid restrictions dodge. Surprised you all didn't hear about it.


pakistan seems to be missing. Any ideas why?


maybe this website isn't even legit and just about harvesting phone numbers. I mean - who knows?


I checked someone who was part of the leak and the correct initials came up so unless that is by chance, it does seem to be querying the database. You can also read the source code [1] where it does an api call to get the data.

We can be pretty sure the results are legit but yes of course it could be running malicious code and stealing phone numbers. Check the raw data if you are worried[2].

[1] https://github.com/Fumaz/haveibeenfacebooked-api [2] https://archive.is/MZqak


This could be an incomplete parsing then from the dataset, numbers dont generally follow a standard ruleset or two that makes it harder to parse them out. As well as if this is a collection of breaches, the structure might not be the same across them all.


Hmm I haven't given facebook a phone number. How can I check if my account is included in the leak? Haveibeenpwned doesn't include facebook in the leaks with my FB email, but I'm not sure I'm checking in the right place.


Answering myself: haveibeenpwned includes the facebook leak in the main page check, so if it doesn't show there, you're not included.

No need to give out your phone number if you registered via email.


>No need to give out your phone number if you registered via email.

AFAIK most of the records in the leak didn't have an email attached.


> if you registered via email

My concern is all the people that may have involuntarily signed up via whatsapp, with no email included but only a phone number.


Oh good point. I didn't know that was possible. Could I have another FB identity attached to the number i use for WhatsApp, you think? I signed up before it was bought. I may have even paid 0.99 or something like that, i forget.

I checked my phone # with the above site and it came up clean.


No one else wanted to try, but I had a feeling my data is breached (seems to happen every few months?)

Anyhow, my phone number had a hit and they showed my first and last initial and corresponding asterisks; seems legit.

For people saying "why enter your phone number into random site" -- not sure how much value a phone number provides without the accompanying information.


I typed my phone number, and it was not found. I'm not too surprised. I never wanted Facebook to have my phone number (and my account is deactivated, though I still use Messenger.) I always ignore prompts for phone numbers on a site/app like that (if at all possible).


From what I can see, this site sends your whole number to the backend to search for a number in the dump[0], while haveibeenpwned.com will hash the input, send only a prefix to the server and receive a list of hashes with the same prefix. If your hash is in the list, you've been pwned, but you can check without leaking your data to HIBP.

Edit: I just checked, seems like the form on the frontpage of HIBP also submits your complete email/phone number. Pretty sure I read about how you don't have to submit your personal data to validate against HIBP, not to long ago...

[0]: https://github.com/Fumaz/haveibeenfacebooked-api/blob/master...


haveibeenpwned.com does not use the k-anonymity method that you've described when searching for phone numbers: https://www.troyhunt.com/the-facebook-phone-numbers-are-now-...


Yeah should have validated that claim first. Seems like the form on hibp.com always submits your input to the server...

Still, if I had to chose between hibf.com and hibp.com, I'd lean to hibp.com since Troy is a known name in the industry and has offered this service for a long time without any complaints.


why would I enter my info on a random site though?


I think this is the really interesting legitimacy vacuum that hacks like this place us into. The official source on this is silent and may stay silent for a long time. People who wait for an official, safe answer may need to wait a long time. On the other hand, the illicit nature of the data makes it legally questionable for other organizations to step into the role of notifier.

So why enter your info on a random site? Because it may be the lowest (definitely not zero) risk way to check if your info is in the leak. If you wait or you build your own thing you may risk less, but balancing the risk with the certainty of obtaining an answer requires a real level of expertise.

Does seem like an easy way to collect phone numbers.


Why would you give it to Facebook?


Because 12-13 years ago when a lot of people signed up, it wasn't clear that they were exceptionally evil. Pre-IPO, pre-ads, back when it was just status updates and photos, it didn't strike a lot of people (myself included) as a particularly bad idea. We hadn't seen the monster social media would become when it suddenly had to improve quarterly profits.


when I signed up in 2005 it was only open to people at colleges, and your info could only be seem by people at the same college... (hence the name, facebook, which no longer makes sense).


ELI5 why facebook is exceptionally evil. (Of course I'm inviting downvotes, but please give me a comment too; ideally a reason that isn't also applicable to 'internet as a whole').


They're not much more evil than any other publicly traded, advertising supported content repackager (so the bulk of "social" media these days), but they're far worse than a lot of other places on the internet because of their power as the "default location" for a lot of people to go when bored.

Facebook, long ago, stopped being about connecting people (except that claiming this gets more people to join), and more about "keeping people on Facebook as long as they possibly could" - because that means more ad impressions, which means more money for Facebook.

They rely on every quirk in human psychology to keep people addicted ("engaged") and scrolling as long as possible. Intermittent reward, randomized ordering (the refresh throbber followed by "new" content), and driving people into toxic emotions and rabbit holes. Anger, outrage, and conspiracy rabbitholes are /great/ for keeping people on the site. They're terrible for the people involved (one could offer the handwaving parallel of a grocery store offering free heroin if you buy stuff there to keep people shopping), but profitable for Facebook.

They believe the entire internet is theirs to scrape user activity from (the "like" buttons were turned into data collection elements long ago, against the original promises made about them), so they can offer better-targeted advertisements to anyone who has a valid credit card. Foreign actor, scammer, seller-of-medical-nonsense, it's all fine - as long as they pay up properly.

And in a wide variety of cases of Facebook being fingered as directly responsible for enabling reprehensible behavior like genocides, their responses are consistently, "We are so, so sorry that you caught us doing that, and we promise to try harder not to get caught in the future." Genocide is extremely engaging, and as long as they can sell ads to people othering their neighbors and calling for violence against them, well, what's wrong?

The guiding principle of Facebook has been clearly demonstrated to be, "What's Good for Zuck is Good for Zuck!" Anything else is secondary (and mostly a concern in that if you don't do anything, people might get around to cancelling their accounts or no longer using Facebook).

I can, and do, apply these criticisms to a number of other properties on the internet, but the social media companies (companies who take user-generated content, repackage it, and algorithmically deliver it in optimally engaging order to other people, interleaved with ads) are the parts of the internet that are demonstrably ruining just about everything that people care about.


Thanks for the thorough response.

> They're not much more evil than any other publicly traded

Ok, that's basically my point, but people seem adamant that they are somehow exceptionally evil.

> content repackager

> companies who take user-generated content, repackage it

What do you mean by taking user-generated content and repackaging it? I understand that users often do this themselves, but what exactly is Facebook repackaging?

> "default location" for a lot of people to go when bored

For me this is HN

> keep people addicted

This does seem evil, if the product is causing the user harm.

> They believe the entire internet is theirs to scrape user activity

I could see how this bothers people, though it doesn't bother me that much personally. Follow-up question: do you view Fullstory as evil?

> Facebook being fingered as directly responsible for enabling reprehensible behavior

To me, this is a sad misuse of a tool. But yes, efforts should be made to limit the tool's possibility for evil usage.


> ...but people seem adamant that they are somehow exceptionally evil.

They are more directly responsible for keeping people in a mentally toxic {outraged, angry, upset} state than most other companies. Their reach far exceeds Twitter (2.5B active users vs about 180M for Twitter), so I'll consider them "more evil" in that they have a far greater reach. While I may have plenty of bones to pick with Google, Amazon, etc, they don't directly influence mental state for their own profit like Facebook does.

> What do you mean by taking user-generated content and repackaging it?

Facebook, Twitter, Instagram, Snapchat, YouTube, etc, do not (meaningfully) generate their own content (yes, you'll see an occasional Facebook blog post usually saying "We're sorry for getting caught this time..." but that's not their primary purpose). They take content that their end users generate (photos, posts, links, etc), and repackage it, reorder it, inject ads, and deliver it to other people.

This is distinct from other content producers (news sites, blogs, etc) in which the site owners/employees/etc are the primary generators of material. I write content for my blog, and while I host and deliver the occasional user comment, the primary purpose of my blog is for me to communicate my thoughts to other people. I also, after several years of experimentation, now do so in an ad-free manner, because the small returns weren't worth the hassle, and I'm increasingly opposed to an ad-supported internet, so I now self-host my content and pay my hosting fees out of pocket.

> For me this is HN

Yes, but HN doesn't matter. It is also neither ad supported or, to the best of my knowledge, a public company (at least the HN interface). It's exceedingly low bandwidth and I quite like it as a remnant of an old style of internet that no longer really exists.

> Follow-up question: do you view Fullstory as evil?

I don't know what Fullstory is, so have no opinion on it.

> But yes, efforts should be made to limit the tool's possibility for evil usage.

Or to respond meaningfully when it's demonstrated that the tool is being used for it. Facebook tends to the minimum required to look like they've done something, and when someone else points out that the people the filters are aimed at have trivially bypassed them by changing the spelling of a word, Facebook throws up their hands and says, "Well, moderation is hard, we can't afford humans, and AI sucks, so... sorry!"

If you're too big to have meaningful human moderation that can understand nuance, maybe you're just too big as a forum/site/community/etc.


Thanks again for the thorough and thoughtful responses.

I started to write a number of counterpoints, then I realized it might not get us anywhere together. Perhaps it comes down to the way that I view web services: it's just a platform. Since Facebook is such a large platform, the evil parts of human nature will certainly be evident there. I'm aware that the "it's just a platform" viewpoint isn't for everyone. With that said, it's hard for me to leave that camp without carrying a lot of cognitive dissonance with me.

Is Ethereum evil because Vitalik doesn't spawn a fork every time innocent people get scammed out of their money (why only do it once (DAO hack)?)? Is Bitcoin evil because it allows people to evade taxes, hurting society? Do you think government should moderate the internet once AI can handle that task? I hope these aren't seen as strawman scenarios.


I'm easy enough to find elsewhere if you want to continue the debate. However:

Bad behavior tends to scale with size. I've run small community forums (I run one now), and beyond spammers, there hasn't been a moderation problem - it's just too small for people to bother making a nuisance of themselves, and they wouldn't be invited to stick around long anyway.

I would rather see a return to smaller, fragmented communities on the internet than the single centralized platforms we've settled on. They just work better. This would, sadly, require anti-monopoly agencies to do something other than sit around with their "Yes, sure, go ahead and buy that company out, whatever..." stamp.

As for some of the other stuff you're asking about, I'm not familiar enough with the details of the events to be able to offer an informed opinion - sorry.

I would like to think that the internet can self moderate in such a way that the governments are not forced to moderate things, and cannot do so even if they wish to do it, but... that seems a less likely future as time goes on.


Ads began in 2007, over 13 years ago.

I purchased my first FB ad in late summer 2007. Not sure if that’s also when it opened up.


I don't think the first few years of ads really made a big difference - Facebook couldn't really figure out how to make advertising terribly profitable while not driving people off.

Like most things, it took a few years to really take off. They IPO'd in 2012, and then really had to prove they could turn a profit (and increase it over time). I've honestly not been a heavy enough Facebook user to be able to pin the transition to a particular point in time (and it was almost certainly a long period of time, pushing for more and more), but the transition of the "Like" widget on random pages from an image icon to a data collection tool was probably a good indication of the transition.

In any case, by... oh, 2014 or so at the latest, they'd definitely started showing their true colors. Engagement Uber Alles. Because ads.


I don’t see much of a difference. It’s obvious this was going to happen. Is there any instance of a big site that doesn’t go this way? It seems as if sometimes people will always act surprised when sites monetize heavily. As if the site was always going to remain as basic as it was in its early days while growing. This is almost never the case except in cases where it goes paid [only]. Even those have exceptions (newspaper sites vs Substack)


By accident? They used to sync contacts on android


Lol good point


“Has my credit card number been leaked”.com


Brought to you by the folks behind HasMyPhoneBeenLeaked.com


I have made a similar site, but just for Lithuanian numbers:

https://fbhack.lekevicius.com

All the numbers that I know for sure to be in the leak return "not found in the leak" on this site.


So, a few things.

1) no indication that there's any rate limiting here beyond a 2 second cooldown (thanks for that, grenoire), but I only tested it using burp intruder community edition, and I only tested it on a set of numbers guaranteed to return false. If anyone wants to test a range with a known-leaked number in it, up to you.

2) it's very possible that if there is rate limiting, it acts invisibly.

But if there's no rate limiting as I suspect, someone can easily just iterate through this data set and extract every number (well, until cloudflare trips the requests). Alternatively, someone can request a large set of numbers that includes their own in order to fuzz the range their own number is in.


Why would anyone scrape this site when they could just download the leaked dataset?


If they aren't rate limiting, scraping the site may be faster.

So far all the posts I've seen on HN that link directly or indirectly to places you can actually download it are to copies on ufile.io, which limits download speed to 500 Kbps if you aren't paying.


where?


It's readily found on Google and several people have posted it in hn comments.


You can easily find it on various Telegram groups and the Bittorrent DHT.


Read the backend, two-second cooldown


I'm looking forward to the sequel, "Have I Been 'Have I Been Facebooked'ed" when it turns out this is just a data harvesting operation.

If you don't want your phone number leaked don't hand it over to a random website that pinky swears it won't keep it. It's maybe not a scam, but still...


My phone number is a 10-digit number, the first 3 of which are an area code. It gets spam calls and texts (though that miraculously decreased on November 4th...) as it is. Now, combined with browser fingerprinting, perhaps this site can tie my specific 10-digit number to some other aspects of who I am, but I'll leave that part as an exercise for a willing volunteer. I'm not terribly concerned about entering my phone number, disconnected from other identifying information, into a random web site.

Convince me why I should be!


TFA? If there is ever an exploit on phone codes, it will be massive. I removed my phone number as a TFA from most services within the last few days.


2 factor sms authentication


1 factor of 2 factor auth is completely useless without any identity to tie it to


we’re talking about the facebook data leak that associates a number to an identity.


We’re talking about a website that allows you to enter a number to see if it’s in a leak.

People are saying you shouldn’t put a number in there because if it’s not already in there, you are leaking some kind of privileged information.

That’s not true.


If that's the case then please go ahead and post a comment containing your full phone number


If I posted it as a comment, it would be tied to my identity.


It's super easy to find your identity from just your IP. I can't remember the service that does it... maybe drift.com?

So putting your phone into a website also ties it to your identity.


Knowing the phone number doesn't mean you can get the texts that are sent to it.



The commenter is leery of entering their phone number into the article link.


Any one can enter any phone number. What does leaking a phone number without any ties except an IP mean?


You look up my number, and then I'll look up your number. Now, we're anonymized as the tracking codes that could be employed will associate my number with you, and vice versa.

no problemo


> If you don't want your phone number leaked don't hand it over to a random website that pinky swears it won't keep it.

What's the worst they can do with it? Call me all hours of the day trying to sell me an extended factory warranty for my free Medicare brace that, by the way, has a Security Number that is under arrest by the Security Administration because it made fraudulent IRS payments with iTunes gift cards to lower my student loan payments because I didn't listen when Microsoft called about my Windows Virus?

Oh, wait, sorry, it hung up on me after dialing because their call center was full. Even if they answer, they just won't play anymore - as soon as they think you might be messing with them (or just aren't going to buy whatever they're selling), they hang up. Twenty years ago, you could have an hour long conversation with the credit card people if you were bored...

Even with spam blockers in place, that sort of garbage is the bulk of the calls my phones get. At least for my personal phone, I live in a different area code from where the phone is (insert XKCD about your area code being where you lived in 2002), so anything that's my phone's area code and not in my contact list is clearly unwelcome.

Phone numbers just aren't a large space to randomly dial looking for valid numbers if you're on a scum VOIP gateway, and clearly the scammers and spammers already have lists of what might be worth calling.


"The worst that can happen"...

In a world where

your bank thinks that SMS is good enough for 2FA...

phone number plus other info is good enough for credit reporting agencies to send out your complete file...

Variations of this theme.

Also, this dump is once more confirmation that Facebook owns lots of data about its users, and doesn't care to protect that data, or to give its users the ability to control personal information. Why should they care? They suffer no consequences for this lack.


Yup - there's another thread related to this breach. A comment led me to do a google search on my phone number.

I found FastPeopleSearch.com, which not only had my current phone number and physical address, but my previous cell phone number, land lines (from back when those were a thing), my Vonage phone number, and previous addresses, all dating about 20 years. If you know my name, you can get a lot more... thanks to these aggregators of public records.

You can attempt to have yourself removed but there are a lot of these types of sites.

Of course, if you take "a" phone number, and use the google search technique, you'll find one of those sites like FastPeopleSearch and learn a lot more about who the phone number belongs to. But presumably anyone who's trying to make use of all this information could do all that without the Facebook breach, if they automate the process.

Of course, the Facebook breach ties a bunch of information to a phone number in a, perhaps, tidy package?


I've had my phone number for... oh, 18-19 years. I assume it's utterly trivial for anyone who cares to find it, if they care to. I know it is, because I've had the occasional random call from people who wanted to give me a call for one reason or another.

And, certainly, Facebook suffers no consequences. A while back, some people noticed that if a company was in the news for a catastrophic data breech, their stock tended to climb immediately afterwards. No such thing as bad press coverage, at least in the age of the trading algorithms!


My phone number is my username. It’s not a password. It’s public.


I already get daily spam and robocalls on my phone number anyway...


exactly! They should be testing a hash of your phone number, not the number itself. Amateur hour here.


I don't think that helps, because the address space of numbers is too small so anything is reversible. My solution to this was to generate 99 extra random numbers that start with the same digits as the real one and send them all to the server. The front-end then shows just the result the user cares about but the back-end doesn't know which is which.

https://www.thenewseachday.com/private-facebook-phone-number...


There is a movie from the 80s called Wargames that you might enjoy.


Every time I see a site like this I wonder if the site is legit, or does it "match" the phone number with an IP.


Aren't telephone directories a thing anymore? At least in my country you can just search for a person online and see their phone number. Someone's phone number seems like the least sensitive PII.


In Australia, phone directories don't generally include mobile phone numbers, and with the massive shift from landline phones to mobile phones over the past decade, this means many people are completely gone from the phone directory. I suspect it may be a similar situation in other countries - the phone directory still exists, but is increasingly useless.

The view of the sensitivity of phone numbers and home address as PII has changed with this trend too.


Interesting. In Sweden, mobile phone numbers are listed in the public directories (at least for me and my friends).


Well, I can opt-out of a phonebook, at least here in Germany for decades already. I can't do that on this leak.


In France you are asked if your phone number should be listed. I do not remember if this is opt-in or opt-out.


Usually those phone directories don't include other personal information, like gender.


They actually do in Sweden and some also include all sorts of things such as social security numbers, criminal sentences etc. Very few things in Sweden that are not public data.


The Netherlands changed that policy after WWII.


Be aware that this (currently) doesn't work for Canadians (at least the one I checked). You'll have to download the dump yourself and grep.


Where can I find it?


The previous thread on this linked the downloads.


This is the first time when I see the UAE being called ARE in a country list. I even went and asked Google, and it turns out there is in fact one ISO standard that calls it ARE. All the others, including ITU (we are talking about phone codes, after all) call it as UAE. Really strange choice of naming standard for something phone-related.


Wish it supported wildcards. I'm not comfortable putting in my phone number for the exact reasons the author states.


Allowing searches with wildcards is analogous to publishing the entire database.

That being said, I think the cat is out of the bag on this one so maybe that wouldn’t be the end of the world.


>Allowing searches with wildcards is analogous to publishing the entire database.

What's the harm in publishing a list of phone numbers, without any other info attached? I can generate a list of all phone numbers in north america by iterating through all the digits.


A list of valid phone numbers is more valuable than a list of all possible phone numbers, for the same reason that a list of valid passwords is much more valuable than a list of all possible passwords. It saves miscreants a lot of time and effort.


I'm not sure how it works in other countries, but for the US the numbers are generated by area code, and they're constantly allocating new ones because the old ones have been used up[1]. This is with number recycling. Therefore if you randomly generate a phone number for an "old" area code you probably are going to get a valid number.

[1] https://en.wikipedia.org/wiki/List_of_North_American_Numberi...


Hmm, so that suggests that getting a phone number with a "new" area code could be a way of reducing junk phone calls.



The data is already public, to be honest.


Be aware that this site doesn't seem to be the whole story, it doesn't match me for example, but this one does: https://jstsch.com/facebook/ (NL only)

So there's some ambiguity or incompleteness somewhere.


I wish there was an "email" input. Last time I had a Facebook account was 10 years ago (probably before phone numbers were de facto identity) and I would be fascinated to learn if my old accounts were in the leak, because Facebook was supposed to fully delete those accounts :)


HIBP added the email addresses that were part of the leak:

https://haveibeenpwned.com/


Nice, thanks. Lol, I always have a slight concern entering my info into these sites because there's always that small chance that information is shared or leaked itself. Paradox, man. To check or not to check.


If you don't want to input your full phone number, you can use this tool: https://codeeverywhere.ca/apps/fb_data .

Searches use partial data from multiple fields to find matches.


Note: US and Canada only


my 2c security tips:

- i trust my browser and site owner version : text in clear

- i barely trust site owners ( if a match is found they still have access the fact that I've verified that number ) :

hash each phone address hashed ie using bcrypt and using a composed salt ( ie : site address + email in the account + phone address ) so rainbow table will be impossible to use ( this because phone numbers are low entropy and even without rainbow table IMO are not that very secure )

than ask user for the hashed version in the text field ( also write a linux terminal style command that can be used to hash given salt and hashing , or redirect to a trusted hasher service online (multiple links can be provided ) )

both text fields can be provided to allow the user a choice


That's odd. My number is in the leak, but it doesn't check on this website.


I'm a little skeptical this is accurate. Supposedly 1/3rd of my country's population is in this leak, yet not one of the 40 people i tried in my contacts list appears on the leak.


The website says no, but after downloading the whole dataset and doing a quick search using "grep -rnw" I got my current phone number in addition to that of my grandfather's (also on FB), so even if the site says you're not facebooked, please check the raw data available on pastebin archive (https://archive.is/MZqak)


I am annoyed. I haven't updated my Facebook in years so most of the data is out of date and I use a separate phone line for personal correspondence, but I do still have a Facebook account for the occasional friend/family that uses messenger. This might be the final nail in the coffin for me and get me to delete my account.

Maybe I can finally get my last couple of friends to switch to Signal.


The phone numbers put into this website can be trivially reversed despite of the false sense of security the phone-number disclaimer provides: https://code.express/docs/blogs/facebooked/


haveibeenzucked is arguably better name for a site.


anyone know of one of these sites where they don't send your phone number / email to the server? The /search endpoint phone_number param has your number.

They should instead hash your number client side and test the hash.


I never shared my phone number with FB, neither for 2FA nor anything else, yet it is in this DB. Could FB get the number via my Android FB app?


yeah not going to use any search tool where i need to enter my number ... you could just post the data by area codes..... just create a bland UI that lists all area codes ..let user click into the area code and then on the next page list all the phone numbers in that area code that have been affected.

I'd use that but not searching by phone number.


Never handed over my number when I could avoid it. I'm very suspicious about it for some reason


This site doesn't seem to include the leaked numbers from my country +356 (MT).


Do not put your number here FFS


Down as of 06APR 1715 PST. Looks like a legal warning by the Italian Gov?


This says no but haveibeenpwned says yes. Hash collision on the HIBP site?


"Is Facebook still safe to use?"

Has Facebook ever been safe to use?


HTTP 451 - Unavailable For Legal Reasons :(


Already getting spammed hourly by text messages pointing me towards URLs I should click.

AFAIK there isn't much awareness about this leak amongst most of FB's userbase: Less tech-savvy and 40yrs ++.


what i want to know is where you can find this information. Also, is it even legal for the website owner to hold stolen information.


They could get hashed copies of the information and find out whether a given phone number was present without needing to retain the original leak.


``` Facebook account ID First name: P** Last name: N*** Gender Relationship status Location ```

:: squints ::

I'll grant you, this is much more problematic for some than me. But for me, this is, roughly, analogous to my actual LinkedIn, Github, or Hacker News profile, which link to my resume (which has my phone number), combined with a squint at my age and a guess.

There's a lot worse that could be leaked.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: