Hacker News new | past | comments | ask | show | jobs | submit login

I'd estimate that Google uses ~1000 TB of fast storage, Bing 500 TB and Yandex 100 TB, so the most basic useful search engine would use at least... 10 TB?



"The Google Search index contains hundreds of billions of web pages and is well over 100,000,000 gigabytes in size."

https://www.google.com/intl/en_uk/search/howsearchworks/craw...


Actually I doubt that this is a true statement and not something to discourage others. Check out these queries: https://www.google.com/search?q=1 12B results https://www.google.com/search?q=an 9B results https://www.google.com/search?q=the 6B results If we estimate that about half of all English pages contain 'the' or 'an' article we'll have about 15B English pages. If half of all pages contain the '1' then the total number of pages is about 24B. If half of all the pages are in English then the total number of all the pages is 30B. So even the maximum is less than the "hundreds". Similar numbers are at https://www.worldwidewebsize.com/


What is fast storage? Is that, for right now, the fastest SSDs available?


HDD is definitely not enough because of low iops, likely Google keeps index in RAM. I think NVME should be good enough, idk for sure.


Ah. I thought that fast storage was a more specific type of storage that might be more expensive. I mean 1000tb is expensive, but it’s feasible to get to that scale with the right funding.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: