Hacker News new | past | comments | ask | show | jobs | submit login
Fuck the Cloud (2009) (textfiles.com)
325 points by colinprince on Dec 21, 2015 | hide | past | favorite | 219 comments



While Jason Scott raises interesting points six years ago, the principles of data management remain the same as the days of dedicated servers with on premises systems. If you have only one copy of data and that machine goes down, then the data goes with it. In my experience in a business context 'the cloud' discussion is about the business' desire to expand capacity without the need for the capital expenditure needed to buy equipment and server admin salaries. By the numbers it is just better to rent instead of buy at the infrastructure level.

The cloud is a murky, ambiguity-laden concept though. Both Netflix and my 92 year old grandmother on Facebook 'use the cloud', but the former is much more sophisticated in their network and data management practices. My grandmother just wants to see fun pictures of her family and great grandkids.


That's what the author is getting at, "The Cloud" is sold as this awesome thing that will never die or break, and worse yet it's sold to people who often don't know any better.

No one realizes that the Cloud is running on the same crap we've always had and is vulnerable to the same issues as everything else. MAYBE the company is better at data management, MAYBE the employees take pride in their job and do it properly, but that's all MAYBE MAYBE MAYBE, and could just as well be "no" and you're entrusting your data to people who really have nothing to lose if it goes into the garbage tomorrow.


Eh... I think that the maybes you are using are a little bit misleading. When a company sells space in their "cloud", like maybe Microsoft or Amazon, there is a business guarantee that goes along with it. If Amazon were to randomly lose a big chuck of Netflix data, AWS's business would tank immediately. AWS is a giant system that uses its scale and number of customers to efficiently provide a more stable system at a lower cost than all of those individual customers could achieve by building and maintaining their own IT.

I think it is sort of like a delivery system. If USPS or FedEx or USP started losing a massive number of packages (I know that they do lose some) then they would get abandoned, just like "Cloud" companies have an incentive to maintain a baseline level of quality. The alternative is that every business would have to create their own shipping services. I think it makes sense to assume that in most cases, unless the business is already massive enough to warrant it, that it is cheaper and more reliable to use the aggregate, dedicated ones for hire. The "Cloud" will be cheaper than individual implementations, and it won't be nearly as suspect to individual implementation errors because the identical system will have been proven by many other customers (otherwise it would be abandoned).


That must be why Amazon's SLA is defined as follows[1] :

If amazon loses more than 3 datacenters (only total loss of external connectivity for all of your instances in an entire availability zone, or total loss of hard disk access, again only counts if all your instances completely lose hard disk/EBS access) for more than 45 minutes in a month you get 10% of what you pay as a voucher for future ec2 usage. If they lose it for more than 7 hours you get 30%.

So no, Amazon, or at least their legal department, does not trust their own competency. Or at least, they're not willing to risk any revenue on that, but they're willing to give you a small future discount to encourage you to restart using the service. Oh and you only get that if you explicitly ask for it.

If they lose your data on EBS/S3/Dynamo/..., you get nothing. So having any data exclusively on any Amazon service should be cause for getting fired, and this of course also means that using Dynamo for storing anything non-trivial is a big no-no from a disaster recovery standpoint.

So I have to say, I would suggest you do not trust Amazon with either your data, nor with keeping your site online. Yes, historically their performance has been better than this, but ...

This reads worse than the SLAs on internet connectivity from places like level3 and cogent (pay 10% less if they fuck up completely for more than 2 days).

[1] https://aws.amazon.com/ec2/sla/


You completely tunneled on the wrong portions of what I was saying. For one thing, Amazon's "real" liability extends far beyond their SLA. Yes, their immediate financial compensation is small. But there are two things wrong with your conclusion here. First, that is not indicative of how much they "trust" the service. They are always going to take the most conservative amount they can get away with, and they are "getting away" with it just fine so why up it? Second, if a company on AWS was severely impacted by a genuine Amazon screw-up, the compensation SLA is the least of Amazon's concerns. It would be like if UPS lost 20% of Amazon deliveries for one day. They wouldn't be nearly as concerned with the explicit liability of compensating Amazon for those deliveries, however much they guarantee for them contractually, they would be far more concerned with everyone immediately switch to another shipping company because they could no longer trust UPS. That is the motivation.

Second, you've completely ignored the actual key points. For one thing, "Cloud" companies make their business by providing a stable service. You have the "guarantee" based on thousands of other business using the exact same infrastructure without serious service failures. That is a huge amount of statistical reliability. Compared to hiring your own IT department and cobbling together your own system, that is actually really good indicator. Second, the cost difference is potentially massive. Again, it is for similar reasons that shipping via UPS is a much better deal than shipping via your own private distribution network. You might have to still pay some people to handle your own inventory from its source (like you'd have to have some people to work on your system in the cloud) but you'd be taking advantage of a much larger, more efficient system instead of having to build and maintain your own.


I think the original point I made stands. Hosting on Amazon's platform essentially means I risk my revenue on Amazon's uptime. Actually using their infrastructure like S3 means I don't just risk my revenue but actually get locked in. Amazon is not willing to do the same, according to their total crap SLA. That tells me a lot. And to top it all of, Amazon is famous for "eating" businesses of their customers : use their infrastructure to see how one of their customers do business, then take it over.

So if it's all the same, I'd rather have a decent SLA. Furthermore this sounds a lot like Amazon's not in fact giving me anything.

Your point is that they'll do the right thing because otherwise their customers would leave. Customers you said in the previous paragraph they give "the most conservative amount they can get away with, and they are "getting away" with it just fine so why up it?".

Sounds like they really care about customers doesn't it ?

> You have the "guarantee" based on thousands of other business using the exact same infrastructure without serious service failures.

I can get that guarantee at 1000 datacenters and colo providers, at least. Some of which have a decent SLA. But even among cloud IAAS, both Azure and Google provide both Amazon's guarantee, and better SLAs.


Again, you are tunneling...

I used Amazon as an example. If you actually read what I'm saying, about how Cloud businesses in general depend on meeting their guarantees and not screwing over businesses, how their superior quality is because of scale and specialization, how they are reliable because one failure would doom them and they haven't failed yet, you could see that this has nothing to do with Amazon at all.

You keep arguing that Amazon is a bad provider. So what? I was never interested in that at all. I'm not comparing them to Azure or Google or the supposed "1000 datacenters and colo providers" you seem to know of. I don't care who is better or worse, I was talking about using cloud services in general.

Pay attention to the topic, pay attention to what my arguments were. Amazon's SLA is utterly irrelevant to anything, what are you even trying to convince me of? None of anything you've said is remotely relevant to my point. It's like arguing about whether Ford or Toyota makes better hybrids in a discussion about whether electric cars are a good idea, I just don't care.


It's funny how you put just the opposite argument of common sense forward and present it as an axiom. The more customers a company has, the worse they treat them. Called comcast recently ? The less choice customers have the worse they're treated. How's your electricity company ? But when you can easily switch ... surely that's better right ? Hmm companies with lots of customers that can easily switch. Have you called Bank Of America recently ? And frankly, they're one of the better ones.

Amazon has superior quality ? They have at best average quality as a vps provider, unless you accept their products that cause lock-in. At which point you're at their mercy, and they have even less reason to treat you well. Amazon doesn't match, say, digital ocean (especially not in the transparency in billing department. WTF). There are other reasons to pick amazon of course, but quality, not one of them. Price ... not one of them. Service ? Not one of them. Stability ? Not one of them. Geographical reach ? At the moment Amazon does better (not that it matters unless you're in Asia).

One failure would doom them ? Just from memory I know two big amazon cloud failures that you could not protect from with availability zones, the ones in a single datacenter, they don't even publish.

The fact that they refuse to publish single cluster failures is probably another aspect of that superior quality you mentioned.

Also, you can get fucked on an ongoing basis just by getting scheduled on a machine. I guess that's part of their superior quality (a lot of VPS providers of course have this problem, others are better at it).


Your statements about UPS and Netflix are absolutely true, and if you're Netflix or Amazon, that's a great comfort. It's less of a comfort to non-gorillas. Yes, Amazon has to be reliable or Netflix will find someone else, but being reliable and accountable to Netflix doesn't 100% translate to being 100% reliable and accountable to my puny business. We'll reap a lot of benefits, as it translates to a great extent, but not as completely as you seem to be implying.


I somewhat disagree here. There is heavy competition, and Amazon might be ahead of the pack, but for most puny businesses there are alternatives. One serious screwup and all of those puny businesses would abandon Amazon for something else, because the other major players would publish it everywhere.

The Netflix/Amazon relationship reminds of the "you owe the bank $100, the bank is your problem, you owe the bank $1bil, you are the bank's problem" sentiment. Netflix is probably such a big business that they are dependent on each other.

On the other hand, Amazon seriously screwing a small business would be like a bank failing a normal customer's withdrawal from their deposit. The second that information went public, the bank would essentially be dead.


> If Amazon were to randomly lose a big chuck of Netflix data, AWS's business would tank immediately.

http://www.theregister.co.uk/2015/09/20/aws_database_outage/

AWS is the new Microsoft / IBM, nobody ever got fired for picking AWS.


There is a difference between a service outages, which do happen and are unavoidable no matter who builds your system, and actually losing permanent data. Which is Amazon does guarantee only 99.95% uptime.

There is a difference in a discussion about trusting the Cloud with your data and services between it going down briefly on occasion (somewhat acceptable, within very narrow limits) and actually losing data or longterm traffic because of a service failure. Seriously breaching the SLA causes compensation as well as a big loss of reputation and business, going down for a couple hours once a year is hardly the type of instability that would terrify most online businesses, nor is it something that individual companies are able to avoid themselves.


(Tedious disclaimer: not speaking for anybody else, my opinion only, etc. I'm an SRE at Google.)

The key piece that's missing here is the idea that risk is something you have to compare, and can combine in interesting ways, then trade off against costs.

There are a bunch of ways in which you can do compute, storage, and networking. You get to pick zero or more of these ways. One of them is "buy a bunch of iron and make a pile of it in your bedroom". Major risk factors here are your house burning down, you getting evicted, or there being a power cut. Another is "rent those services from an infrastructure provider". Risk factors here are much harder for you to visualise, but include things like "governments ban that company from operating in your country".

You can look at the risk of any of these options, and quantify it with an SLO, like "we intend for this compute resource to be available 99.99% of the time in a given quarter". You can then have an SLA that defines what will happen if that objective is not met, and measure how often this is complied with over time. There are lots of ways to analyse this information, but let's suppose that you can reduce it to a single number measuring how safe the resource is for your use case.

If you only look at a single option, and say "this has a safety of X", then the only thing you can get out of this effort is anxiety. This only becomes interesting when you start looking at differences between alternatives, like "the safety of servers in my bedroom is X, but the safety of buying resources on GCE is Y, so I can get this much of an improvement by spending that amount of money", or "by doing both of these things I improve my safety to Z, and I am willing to pay the additional cost of doing so". Or perhaps your position would be "this option is less safe but much cheaper and I'm willing to accept the extra risk".

The problem I have with the "fuck the cloud" article is that it doesn't do any of this. All it says is "the safety of this option is only X, you should experience anxiety". Is X higher or lower than that pile of iron in your bedroom? You still don't know.

(Realistically, unless you have the ability to build a system in your bedroom that has continental diversity for storage, N+2 of everything for hardware failure, etc, your bedroom is likely to be far less safe than the major cloud services - unless you live in a country which regularly bans American companies from doing business with you, which a sixth of the world's population does.)


There should exist an option that is between those spectrum endpoints. One that runs a lot more free (inspectable, trustworthy) software than present cloud services. One that leaves some control with the owner of the data.


There does - in fact, there exists every single percentage of difference between servers-in-bedroom to full-Azure/AWS. We run full AWS. The company one floor up runs their own OpenStack private cloud on colocated servers in a Level 3 facility. Want to rent and not own? You can have that, too. Softlayer is somewhere in-between there.

There exists just about every price point and combination of services you can imagine. It's awesome, too, because I can grow a business and can enter into the market with a whole "rack" of servers for almost no cost at all.


Jason's follow up post to this: http://ascii.textfiles.com/archives/4352


tl;dr Renting remote computational power is cool (AWS), but renting remote storage is not.

I have a complete history of email archives since I started using gmail (ie cloud), but have lost years of archives from the years I managed email myself.

It turns out that companies focused on data storage and retrieval do a better job than me. And that's fine. I pay someone to do my taxes and I don't build my own furniture. Specialization is a good thing.


The tax accountant analogy goes pretty well with this:

The accountant is specialized in doing your taxes (aka serving up your data/files) but you still need to keep a copy of your tax records, receipts, and etc.


And so, ideally, I want to be able to tell my accountant to store my archival tax records in my safe-deposit box, not in his office. Compute infrastructure is a commodity—doesn't matter who's paying for it—but all the services you depend on should rely on storage infrastructure you have an SLA agreement with.

I don't care about the "distributed computation" promises of Diaspora or Sandstorm.io; I think they're wrongheaded. Anyone can do compute. But I really do hope that one day that my Facebook account can be canonically "stored on" a database instance I'm paying for, that Facebook's app servers will reach out and connect to and treat as the canonical source for "me", treating their own records as a secondary cache. This kind of setup would makes all sorts of things simpler and clearer and more secure; there would be a definite boundary where "my data" stops (the DB I own) and "Facebook's data" starts (the DBs they own.)

And, to be clear, I'm not talking about everybody running their own infrastructure, or even everybody knowing what an IaaS provider is. Ideally, PaaS providers could get into the "consumer instances" game the way Dropbox is in the "consumer files" game. My "Facebook account database" above could be transparently launched into my own cute little private cloud by my PaaS provider when Facebook requests it through some OAuth-like pairing API. I wouldn't need to think of myself, as a user, as "owning cloud database instances." From my perspective, I'd get an abstract "Facebook account" (which is actually an app instance and attached DBs) sitting in my Heroku-for-consumers account. The important bit is that I'd be paying for the resources that "account" object consumes, that I'd have an SLA on those resources, and that the PaaS company would have every incentive to make it easy for other third-party services to interact with my "Facebook database" in a way Facebook themselves aren't. I, as a user, have no need to "manage" a cloud of my own; I just need to be considered to own it.


If I'm a Facebook engineer, I would never agree to this because there is no possible way to optimize performance in this scenario.

What happens to Facebook when your data provider goes down, or just gets slow? What if they mess up permissions or change their API?

Maybe you're thinking "that's fine, if my provider isn't reliable, my Facebook account becomes unavailable and it's up to me to choose a better provider." But what about all the people who are sharing your feed (or whatever it's called these days, I don't really use Facebook)? Do they query your stuff, and then timeout when it doesn't respond in time? Now other people's stuff is slow to load.

Just seems like an engineering nightmare, to me.


Think of the Datomic architecture[1]: some nodes are "storage", while other nodes are "transactors." The transactor nodes pull "chunks" of rows/objects/documents from storage nodes, compute relational indexes locally, answer queries from those computed indexes, and finally persist computed index "chunks" back to the storage nodes.

Now, note that the "canonical" storage that gets read from doesn't have to be the same storage that the indexes get written to. The first can be owned by the user, while the second can be owned by Facebook.

Presuming an architecture like this, the latency and availability of the user's "Facebook account" database is relatively immaterial. While writes would have to be synchronous (so, like you said, Facebook would have to give the user a "sorry, your account is unavailable" error), reads could be asynchronous. Think of an online RSS feed reader service: the "primary sources" are the third-party sites with their RSS feeds. Sometimes those sites go down. When they do, the reader-service can't retrieve the feed–so that feed just goes stale.

Things like Facebook's graph, meanwhile, are fundamentally indexes. The base-level "documents" in the graph are relationship assertions—a copy of "B accepted A's friend request" stored in B's database. The graph is a computed value built on a pile of those. When Facebook can't reach someone's database, things like these relationship assertions just go stale.

The crucial idea, here, is that for Facebook to do its job, it probably has to cache a majority of the stuff in the user's database in one form or another—just like an RSS reader caches RSS feeds. But this is purely a cache, in a fundamental sense. Users who don't "check in" with Facebook could be cache-evicted from its database. Other users would still have relationships with them and be able to post on their wall and such (they'd be putting those documents in their own outbox); Facebook would just no longer bother computing anything that's personal to the user, like their news feed. There would be every incentive to set up the architecture such that user data that wasn't needed would be "garbage collected" off of Facebook's servers, because it could always get put back on, the moment that user's account-instance woke up again and said hello.

This would also mean that Facebook wouldn't need to store any of their user-data in anything resembling a relational normal form. Every table would be a "view" table. The canonical database, owned by the user, could be relational and full of nice constraints and triggers (and the user could even add these themselves); but since Facebook can just query out of that to get any data it's missing, it wouldn't need anything like a "users" table. (Fascinatingly, if Facebook was built as a microservice architecture, this means that each microservice would probably separately query the canonical data from the database in order to generate its own indexes; the Search service would know one "face" of you while the Photos service would know quite another. These could even—in theory—be separately ACLed within your own DB instance, giving the user true, actual control over what Facebook can do with their data, component by component.)

[1] http://docs.datomic.com/architecture.html


It sounds like what you want is close to the goals of Peergos (https://github.com/ianopolous/Peergos). Disclaimer: I am the team lead. We use IPFS for the storage (and P2P networking and DHT) and are storage provider agnostic, where the provider can't read your data. You can add as many IPFS instances that backup (pin) your files as you like, on whatever storage provider you like. You could grant a company like Facebook, fine grained read or write access to sections of your data that are relevant to them.


(OT) if you still have your archives you can get Gmail to ingest it (over POP), for retroactive 'cloudiness'. My Gmail archive goes back many years before there was a Gmail.


I still have the archives for years before gmail, but probably 1/3 of those are in a proprietary format long since not supported that ran on a proprietary OS that ran on proprietary hardware. So that sucks because those old emails are almost certainly more interesting than the current pile of madness.


This should be higher.


I think this gets to the heart of "what people value" and Jason is looking at it from a perspective of a collector. I'm also a collector so I feel much like him.

I wonder if there's a relationship between collecting and being an introvert. I have no evidence but feel like there is. I think most people just don't value "things" the way our types do. What most people value is their social life. They post pictures to Facebook not to store the pictures but to get a reaction from their friends about the picture. That's what they value. The social interaction, not the thing.


I really don't like to marginalize others by what type of personality they are. This is yet another way of putting people into buckets so that you can more easily go about your day without thinking about it much. Before facebook people would keep large collections of photo books so that when family and friends came over they could easily share them. It doesn't mean they valued those pictures any less because they were extroverted vs. introverted. People still keep hard copies of the pictures they cherish the most, like a wedding album. People value all sorts of things, but maybe they see cloud storage as perfectly secure, for the moment. Maybe personality plays into that, but I think its more about your approximate knowledge of the underlying technology that keeps people up at night.


> I really don't like to marginalize others by what type of personality they are.

Me either, I hope that's not what it sounded like.

> Before facebook people would keep large collections of photo books so that when family and friends came over they could easily share them. It doesn't mean they valued those pictures any less because they were extroverted vs. introverted. People still keep hard copies of the pictures they cherish the most, like a wedding album.

Yeah, absolutely, and I think people do still keep the photos they care most about today. I just think we overestimate how many of those photos exist. I think the overwhelming use of taking photos is for social conversation.


He acknowledges and supports that. He's not saying "don't use facebook" but "facebook is fun but temporary, like a party. Don't leave stuff there."


I understand why people are upset about the word. It's misused.

But I remember what it was like before "the cloud"... You had to provision machines one by one, often via email or phone. It took hours or days. Billing was usually done by the day. It sucked.

Now you can just send a POST request to a machine and in a few seconds you have access to a new instance. You can send 1000 requests and get access to 1000 instances. And then you can send some more requests and only get charged for a few minutes of time. When this transition happened, we called it "the cloud" because it's a big undifferentiated mass of computers. We could've called it "the soup" but we didn't.

That's what it has always meant to me. I don't understand why people want to eradicate the word just because it's misused. People misuse the word "internet" all the time, but I don't think we should strike it from our vocabulary.


> That's what it has always meant to me.

And that's my biggest problem with the word, and why I go to some lengths to avoid using it when talking to clients, despite a number of my employer's (and their partners') products including 'cloud' in their actual name.

No two people actually agree on what the c-word actually means. For you it's elastic compute. For others it's cheap storage, or geographical diversity, or managed RAID, or an ersatz CDN, or their hosted email or wiki or some other application, or an accounting convenience (dipping into the opex, rather than the capex, bucket), or, of course, some combination of those and others.

The biggest problems I've dealt with in IT over the years stem from people having even slightly different ideas of what words mean -- so a word that involves wildly different understandings is not a recipe for tranquillity.


Agreed. When I read the title, I was thinking "Oh, so you'd prefer to pay for dedicated servers? What a pain!".

Though, I don't think the author is upset about the word. He's just warning against services that fall within his definition of "cloud" - which is pretty fair. Losing your stuff is no good, and there's always cruddy services out there.


Yeah, I guess you're right. Weirdly he's not talking about the cloud at all, he's just talking about storing things on someone else's servers. So I guess he's one of those people misusing the word cloud.


When everything was "moving to the cloud" a few years ago (around when Jason wrote this), I started to have similar feelings. It all felt like something marketers were over-hyping. "Your computer in the cloud" (ever had a shell account that was your main system? This isn't new), "your games in the cloud! Ever played Nethack using that shell account?

I guess the only 'new' thing that I saw was the scaling capabilities based on capacity, but we've had time-sharing (albeit, slightly different) for many years.


My mom was a mainframe programmer and says the cloud it is just like what they had 50 years ago with timesharing but a different name. I just smile and nod.

Is such a statement much different than saying PCs are just tiny mainframes with a screens attached?


You aren't timesharing your tiny mainframe, so it is quite different, yes. The cloud is precisely what your mom had on a mainframe 50 years ago, including virtualization, on-the-fly scaling of resources (maybe a little more difficult because of the larger equipment), and even renting time on computers you didn't have in your offices, since not every institution could afford their own computer.


Both statements are correct. Remember that it was a personal computing 'revolution'. It would be a great step backwards to accede power to cloud service providers.

Computing at the whim of others aka timesharing.

Smile and nod but your mother has seen a few cycles of the industry.


Providing an employee with equipment (such as a laptop personal computer) to do their job is (legally) different than leasing equipment to a contractor, or transfer-pricing between departments in a corporation.

Remember that a timeshare was very expensive and that those costs were accounted for carefully not only for the purposes of internal billing but also for benefit-analysis.


The thing that annoys me is the idea that I should pay for a service that I've never needed so that I can store everything on the cloud. Actually, I should pay for three or four services. It's really not a good deal for me.


The points in this article also apply really nicely to websites and other services, many of which seem to make the mistake of outsourcing everything to other people's platforms. Oh sure, you may be using 'the cloud' to host your webmail, or your forums, or your chat, or anything else... but what if that goes missing? You're in an even worse situation than the individuals using these services to host their personal files. Remember what happened to IPBFree? They were shut down for some weird, unexplained reason (rumour has it as a raid for illegal activity), and literally everything on their servers was wiped out. Now imagine if the same sort of thing happened to one of these services, like Dropbox or the likes... Millions of people would lose most of their files overnight.

Use such services for backups, sure. But don't rely on them too much, you don't know what might be going on behind the scenes at the other end of the line. You don't know their financials (usually), whether they're under investigation for something, whether an intelligence agency is spying on their servers, whether their security is up to par in every possible sense...

And if they're offering it for free... well, they don't have much invested in keeping you as a customer when things get tough. Became unpopular recently, for saying something controversial or 'stupid' on social media? Made enemies in the political world? Then a lot of companies will be quite happy to shut down your account to avoid bad PR. By using these services, you provide a nice target for the social media mob the minute you do something that a lot of people don't like...


> like Dropbox or the likes... Millions of people would lose most of their files overnight.

That's false. Files in Dropbox are also stored locally on all your Dropbox-enabled computers. They would simply stop syncing, and you'd plug in another syncing service. If Dropbox disappears, your Dropbox folder just becomes a regular folder.


I think it is an accurate assessment that many (potentially millions) people would lose their files in such an event. Many people use cloud storage (e.g. Dropbox, Google Drive) without a native client. Even on mobile, it is common to delete photos from the device after they have been synced/uploaded.


Unless they fuck up sync again and all your locally stored Dropbox folders get wiped clean. Happened once. I'm pretty sure Dropbox is more than cautious about it, but it's still a possibility with them, and more so with everyone else.


This article is a lot of fun with the Cloud To Butt Chrome plugin installed.


>So please, take my advice, as I go into other concentrated endeavors. Fuck the Butt. Fuck it right in the ear.


That is by far my favorite plugin. Just don't forget to turn it off before presenting in a meeting...


… or before you make that quick-just-this-once edit to your site's templates via the CMS's wysiwyg editor. Turns out buttfront.net references don't resolve where you think they do.



I sometimes forget about it when using the c-word in a HN comment and later editing it.

And other times I don't forget, I just leave it and continue with a smile :).


Why would you be presenting Chrome in a meeting?


Because you've built your slides with Reveal.js (http://lab.hakim.se/reveal-js/#/), for example.


Um, lots of reasons? Demoing a product, wiki or Google docs technical presentation, code review in Bitbucket.


This seems unnecessarily angry. I suppose that this person wants to keep his data forever, but many people just don't care that much about their social media photos and the like.

I'd love it if Facebook and Twitter had a rolling deletion period option - everything more than six months old is shredded forever, as far as the service is concerned. While people can obviously store shared photos and this wouldn't actually destroy them, I'd like that new contacts wouldn't be able to go back and look through someone's entire history. It's like a more social and longer-lived Snapchat.


> I suppose that this person wants to keep his data forever, but many people just don't care that much about their social media photos and the like.

"This person" is Jason Scott, who works as the Internet Archive and also heads up Archive Team (the loose band of internet folks who race to archive sites about to go dark).

He's a digital historian; his job is to save everything he can.

https://twitter.com/textfiles

https://archive.org/about/bios.php

http://archiveteam.org/index.php?title=Main_Page


This article doesn't really stand on its own; it comes across as a rambling diatribe by someone who is out of touch. If you have to know who the author of an article is to be persuaded by the argument given then it isn't very good. There are much better articles expressing this point of view.


>This article doesn't really stand on its own; it comes across as a rambling diatribe by someone who is out of touch.

Remember that next time you're red with rage because a cloud provider lost your data or a service you depended on is now bought/closed/gone.


My point was more on the quality of the article. The article's point is a valid one, if you can find it.

To constructively disagree, though: I keep backups. I have a Synology NAS device that I really like. But for your average person, I have to wonder - are their digital photos really safer on their laptop than they are on Facebook? Facebook is a fairly stable company. Laptops are lost, stolen, and damaged all the time. Files are accidentally deleted. People get viruses. People forget the password to their full drive encryption. When Facebook does bite the dust, it's hard to imagine that data just disappearing - it's incredibly valuable, and in the worst case, someone would buy it just to sell it back to people. Not ideal, but it isn't lost, and you had free storage for several years anyway.


Think about the future too though. Facebook strips exif and may have resized/recompressed your images. 20 years from now when we can do more super cool things with that data, the data won't exist in your FB photos.

Tangentially related, this is why I shoot RAW. Not because it might give me better pictures today, but because it WILL give me better pictures in 5-10-20 years. You can take a RAW today that was shot 5 years ago and pull detail that was impossible to pull when it was shot, and that ability will only improve.


Ideally, you'd want to keep RAW + JPEG. I'd be worried about reading some of the more obscure camera RAW formats in the far future. JPEG seems like a good backup (at the cost of the data loss you mentioned).


Yep, sometimes I convert to DNG for that reason, but that seems like pushing the problem out...


Safer on laptop vs. forgetting the password for facebook.

By making a technically inept strawman, it's reduced to an argument of which is more idiot proof.

The problem with idiot proof is that there's always a better idiot out there.

The argument here by the 'anti cloud' side is 'why not both?'. To argue for one side or the other exclusively is pointless.


Respectfully disagree, although I appreciate the reply.


This seems to be more of a philosophical stance than an argument based on outcomes. It's also a call to think before knee-jerking "the cloud". And I do see the philosophical argument it has merit. But it does lack a business argument [for service consuming businesses] which will win out in most cases.

In other words, in real world usage, the cloud is more dependable and flexible than internal IT infrastructure providers, for most businesses.

That said, there is a good philosophical argument against the cloud.


> This seems to be more of a philosophical stance than an argument based on outcomes.

I agree, but its fundamentally simple: Always be in control of your own data, because no one else is going to be looking out for it.


> I'd love it if Facebook and Twitter had a rolling deletion period option - everything more than six months old is shredded forever, as far as the service is concerned.

I really like this idea. The biggest benefit is that it stops people expecting these services to act like an archive. With the current system it's easy to think "Oh I can just look this photo up again on Facebook if I want to". Instead we should be treating these services as publishing platforms while we maintain separate archival versions of our data.


An approach of "delete all personal data over X days old" would actually solve a lot of those companies' data protection issues. It would also make them less likely to be hacked, since they would be less tempting targets. Why try to hack some celebrity's chat app if you know all nudie pics are deleted after X days?


Or alternatively roll them into inaccessible private space after X amount of time determined by user.


Don't we keep our entire life's savings in companies aka banks that we don’t run, don’t control, don’t buy, don’t administrate, and don’t really understand.


They are much more tightly regulated (legally), and insured, often by governments.


That's an excellent analogy I hadn't thought of. SLA's between companies is perhaps a proxy in the data world, but nothing for individual users.

When data will become so important/politicized that there will be regulations about data retention?


There are some regulations already - mostly on the side of keeping some of the data for tax/legal purposes. For decades.

(Makes me wonder how all those fly-by-night cloud startups are handling that.)


The reason they are is because of their intrinsic propensity to destabilization, vis-a-vis Minsky and the Financial Instability Hypothesis.

Of course, how you structure your regulatory framework can have adverse consequences, as well (i.e. implicit guarantees by GSEs on mortgage-backed securities). It is further arguable whether or not fractional reserve exacerbates these effects.


I was just talking to my pals at Digital Equipment Corporation and Sun Microsystems about how stable the technology industry is.

Then, I read an article in Google Reader discussing how we don't even have to worry about arbitrary and capricious market behavior from our cloud partners, since we all have long term contracts and are protected against upward price swings from our cloud partners or material changes to the services they deliver.

</s>


I see the sarc tag, but if you had a long-term contract with Reader it'd probably have been a different story.


Maybe. When you subscribe to Google Apps, you get access to applications that aren't necessarily part of your SLA.

The point isn't that these services are bad. Just that you need to be strategic about how you use them.


Our life's savings are not unique, and can be replenished from a different source. Our childhood pictures and personal correspondence are, and generally can not.

Besides, banks are subject to rather a lot of external control, through regulation (oh noes!). Here in NL the government actually guarantees your savings (but not investments) should a bank go belly-up. I believe the US did something similar, although I didn't bother to follow the specifics about who bailed out who for whom.


> Our life's savings are not unique, and can be replenished from a different source.

In absolute terms, they can, but no one I know of has a solid backup plan for life savings. Life savings come from saving over a lifetime. You can't really replenish them without replenishing someone's life and ability to save.


You gave me an interesting startup idea:

How to back up your savings!(.io)

Now to make it work...


I believe the word you're looking for is "insurance" :P


By the way, as far as I know all the banks from the European Union guarantees savings up to 50,000 euros, some even more. And if you have more money, you can split them between multiple banks.


NL 100K euros.

http://www.dnb.nl/over-dnb/de-consument-en-dnb/de-consument-...

Subject to some fine print so it can take some work to get your accounts set up in such a way that you would qualify.

Bank mergers are a risk to be aware of here.


In the US, deposits are guaranteed up to $250K per ownership type, per customer, per bank. Ownership types include individual ownership, some retirement accounts, joint accounts, trust accounts, etc.


I think most people have more of there savings in stuff than banks. (Clothes, PC, car, House etc.) At scale cash is a proxy for wealth not actual wealth.

PS: A home loan might seem like the bank owns your house, but they can't say no when you sell it.


> PS: A home loan might seem like the bank owns your house, but they can't say no when you sell it.

They can, unless you satisfy the loan by paying it off as part of the process, at which point they no longer have an ownership-like interest.

In the unusual cases where you try to sell a house without doing that, the bank absolutely can -- and often will -- say no.


If you have the cash to pay them off, then they don't get to say no even if it's a short sale.

Alternatively, you can generally walk away and sell it to them for the value of the loan.


> If you have the cash to pay them off, then they don't get to say no even if it's a short sale.

If you actually pay them off, then they don't get to say no, because paying them off is, essentially, buying out their interest in the property, under the terms of an existing contract. That doesn't negate the fact that they have legally-enforceable rights in the property until and unless you do that.

> Alternatively, you can generally walk away and sell it to them for the value of the loan.

Only if the mortgage is governed by the law of a jurisdiction where pursuit of mortgage deficiency isn't allowed (either in general or for mortgages in the specific conditions yours has.)


The bank would have the power to stop you from selling your house in some circumstances (such as a short sale).


How about 401K and Social Security ?


Social Security is not wealth, as to 401K average savings is not that high.

http://blog.personalcapital.com/wp-content/uploads/2014/03/4... Note that's only for people with open 401(k) accounts. Also it's pretax, so subtract ~25-35+% from those numbers.

For comparison the average house is worth ~180k.


I don't. Credit unions exist.


>Don’t blow anything into the Cloud that you don’t have a personal copy of.

I don't understand this logic. Amazon's S3 offers service level agreements with failure rates that at one point implied the statistical likelihood of losing an object to be once in "thousands of years". When dealing with any sort of stable storage this is simply something I cannot offer. I couldn't produce a set up locally with the resources I have for making guarantees on the decade level let alone millennia.

With that said I keep personal copies, but the authoritative copy is what's in the cloud, because its a hell of a lot more stable.

TL;DR I hear this argument all the time. The cloud isn't perfect but its a hell of a lot closer to anything I could achieve. "not invented here" syndrome won't save your data.


Until your control panel gets hacked and you lose all your data. See: codespaces.com (assuming it wasn't an inside job or a dumb mistake, but even then the same rules apply). So no, DON'T BLOW ANYTHING INTO THE CLOUD THAT YOU DON'T HAVE A COPY OF.

It has nothing to do with 'not invented here', it has everything to do with your inability to outsource your responsibilities.

The degree to which people rely on others to take care of their stuff is a huge blind-spot. Jason's advice is spot on in this respect, no matter what the up-time guarantees of the cloud solution you are using (and no matter what the redundancies), if you store all your data in the cloud without an off-line copy your company is 3 mouse clicks away from being history.


The 3-2-1 rule is a rule for a reason.

3 copies

2 formats

1 copy off-site

The cloud is a great place for that 1 off-site copy.


Assuming the rest of your data isn't in the cloud, yes. Otherwise reverse the situation. But you got it perfectly.


No argument at all about the last sentence, but you should have skipped the first four lines as wildly premature optimization.

The vast majority of the world would be much better served by (to appropriate your jargon) a 2-1-1 rule, because it's a straightforward treatment for a straightforward problem that is easily implemented. Yes, "format skew" and double-failure of backup solutions does indeed happen, but at a much lower incidence than "oh crap I deleted it!".


I think the 'three copies' rule refers to cyclic backups, not necessarily three copies of the one piece of data.

This protects against data corruption that is not detected immediately.

Another common way of doing this (and one that I prefer) is one where you rotate out a backup medium with ever larger intervals. So one gets set aside per week, then one gets set aside per month and so on. That gives you a series of snapshots in time that will allow you to pinpoint with some accuracy when an event happened. Longer ago you'll have less accuracy but this can help a lot in trying to triangulate who or what messed up. Just being able to answer the question of whether or not 'x' happened before 'y' was hired or after can help in narrowing down the number of suspects in case of a breach or other nastiness. It also prevents against back-ups for whatever reason not wanting to be reloaded (and you should guard against that by loading your back-up immediately after you make it, even so, the medium might fail the next time you try a read).

Better still if there is a streaming log of everything but only very few companies can afford that sort of solution for all their data. Those can be hard to restore from (by replaying) so there too a snapshot system can help.


Oh sure, there's lots to say about the design and effective use of a backup regime. I agree with all that stuff.

But if you're going to condense it to a "rule" that will help people not well-versed in the field, that rule can only be "MAKE BACKUPS!", because at least 90% of the data loss scenarios in the real world happen because simple backups weren't made.

Don't make it more complicated than it is, because someone will stop to do it "right" and then lose data because they didn't just make a copy on a USB stick.


Modern 3-2-1:

3 backup services 2 formats 1 local copy

I guess that could get expensive depending on the services.


Yes, it could be. The trade-off is usually measured by the cost to the company of re-creating or losing the data entirely. So for some stuff such a system is feasible, for others it isn't and then you'll have to compromise. Log data for instance does not need to be stored like that unless there are legal rules saying that you have to keep it no matter what.

For each business this is a delicate affair and you should spend some time on figuring out what you really absolutely have to have and what you could lose without losing the company or getting in trouble with the law.


What's the relative risk that my specific control panel will get hacked and I, specifically, will lose all my data, vs. the risk that data on my local store will be compromised by a virus on my computer encrypting my whole hard drive and refusing to let go until I send XYZ bitcoins to a specific address?

There's risks and then there's risks. As always, you can spend some time and money to mitigate some risk, and decide some risk is low enough that you don't care to mitigate it. I wouldn't just assume one risk vector is higher than another without some concrete numbers.


There's risks and there's bankruptcy. If you feel like gambling then be my guest. What's the risk of your house catching fire? Are you insured? What are the risks of getting burgled? Your car stolen, a car accident? Are you insured?

What are the risks of having a bad health issue happen to you? Are you insured?

I'll bet that you don't know the answers to any of those questions, and yet, you are probably insured against all of them.

As for your question in more detail: it happens often enough that for me if you operate your business 'in the cloud' and you don't have a back-up of your data outside of the cloud to guard against catastrophic data loss due to either malice or accident that I'll happily fail you. Think 'sysadmin was depressive, wiped out company' (or maybe you let him/her go and they took revenge or any one of a number of other scenarios that would instantly terminate your existence online if you had not guarded against it).

Risk management is consequences * incidence versus cost to mitigate. If the cost to mitigate is negligible, the risk is measureable and the consequences are terminal then it is a no-brainer to protect yourself against this. At least one of my customers wished they had set up a backup facility for their critical data (too bad, end of story for them) and there is one published case that we all know about. That's two that you can count on and most likely other trouble shooters have similar stories.

I see catastrophic cloud data loss as having a much higher chance of happening than many other items on the list of stuff that I verify before greenlighting an investment.


I see what you mean; I read "DON'T BLOW ANYTHING INTO THE CLOUD THAT YOU DON'T HAVE A COPY OF" and thought of only my personal data (because if it's user data, it's originating from the cloud and I'm not blowing it into the cloud; if anything, I'm pulling it out of the cloud).

Though if we unbox that a bit... How do we balance the risk of loss due to your cloud being owned vs. loss due to the risk of your users' PII being violated if someone attacks your in-house (ostensibly in-house reliability and security-managed) copies? Better make sure you're spending the money on security best-practices on all your copies, not just the "live" ones.

Side-note: What are your thoughts on two separate cloud providers as providing sufficient insurance? Say, having your data live in Google Cloud and at rest in Amazon S3?


That was Jason's use case that he had in mind, but for plenty of people the situation is the exact opposite but the principle remains.

Having two cloud providers would work assuming they don't share any critical infra-structure and assuming that it is not the same set of employees that have access to both systems. Also helps if you have a totally separate set of credentials for the back-up system and if that back-up system can only be unlocked by very senior people (preferably execs) after a catastrophe of suitable magnitude hits.

Whether you store your primary in the cloud or your back-up in the cloud the story is the same: have a copy somewhere else.

Finally: the only real back-up is one that is off-line. So from that (ok, ultra-paranoid) viewpoint it would be best if you actually went to an off-line medium to store your data in such a way that nobody can wipe it all out without physically destroying all copies.

All this of course after suitably weighing the importance of the data.


If your data is truly valuable to you, then yes, you host it (encrypted, I hope) in a cloud service to protect you from onsite issues; you host it live locally, to work with it — and you have a hard drive in a safe deposit box somewhere, as well, in case something goes wrong with the cloud copy.


Read this and then consider if you ever want to write something like it. https://www.facebook.com/orainwikihosting/


That's very painful to read. Codespaces went pretty much the same way, here is what they wrote:

"Code Spaces will not be able to operate beyond this point, the cost of resolving this issue to date and the expected cost of refunding customers who have been left without the service they paid for will put Code Spaces in an irreversible position both financially and in terms of ongoing credibility. As such at this point in time we have no alternative but to cease trading and concentrate on supporting our affected customers in exporting any remaining data they have left with us."

They promised a follow up once they got to the bottom of what happened but they never did, I'm still curious what the whole story was.


Once again. I only entertain statistical arguments. This anecdotal nonsense is for the birds. Any given service can fail at any time. This is known kaleesi. You can poop out case after case of some shit wiki platform losing your bird watching posts and it won't make a lick of difference. You need to show me that its better with the magic of statistics. Should be easy peasy show me that you have lost less data(proportionally) than amazon over your entire career.

I can make this even easier. Its almost certainly going to boil down to this question. Have you ever lost even a single byte of data? Cause if you have, you aren't even in the same ballpark.


"Have you ever..." is the wrong question when it comes to risk management; the right questions are "Is is possible...", "How likely...", "How can you avoid..." and "What's the cost to avoid...".

Have you ever been killed in a car accident? No? Then I guess you don't need a seatbelt, right? Has your house been burnt down by a lightning strike? No? Then no lightning rods and proper grounding for you, right?

You don't wait until a disaster occurs before you put a disaster recovery plan in place.


This is a misrepresentation. The "Have you ever" was to demonstrate a point. There are armies of IT housemen that believe they are better than securing data than the major storage providers. This attitude is pretty pervasive and patently nonsense. My point was to get them thinking. If you in your entire career have lost even a single byte of data. You are statistically missing the mark(in terms of data loss) by orders of magnitude. S3 is stupidly resilient. So much so that no amount of raid configs, backups, and redundancy on the part of the joker in the corner cubicle will ever even be even close.

So getting back to risk management. So we about to make a QB selection for the big game. Now that we understand that the implementation of folks like jaque here are like the 12 year old pee wee standout vs. the cloud provider's seasoned NFL quarterback. Which one do you pick to safe guard your business.


> Until your control panel gets hacked and you lose all your data

This is just alarmism. If you really wanted to demonstrate your point you would show me some data. The data would demonstrate that over the millions and millions of users on a number of cloud platforms that their rates of data loss are significantly higher than your "home spun" storage. Then you would take out the outliers and show an honest distribution on the most stable vendors(since I'd expect the majority of data loss over all vendors to have some amount of locality within a specific vendor). The result of this will be the rate of data loss on the most reliable cloud platforms. You can then compare this against your own success rates.

If you have ever(even once) lost even a single byte of data(for any reason), I do believe you will find yourself hopelessly outmatched.


> demonstrate that ... on a number of cloud platforms ... that their rates of data loss are significantly higher than your "home spun" storage

You seem to be missing the point.

People are not saying "don't use cloud storage at all"

They are saying "don't use cloud storage exclusively"

You might trust that your cloud providers will never be hacked, that your individual account on them will never be hacked, that they will never suffer a catastrophic failure, that they will never go out of business, that they will never be fully or intermittently down at the moment you need to retrieve some data, and so on and so forth, based on their claims and documentation, or that you won't cock up at some point and cause data loss in your account, but I prefer to have my own copy (or copies) as well as the ones that are "in the cloud". For truly essential data at least one of those copies is both offline and offsite.


Unless you've personally memorized all the data in your head, you're trusting someone or something. Even if you've printed it all out on acid-free paper and stored it in a bank vault, you're trusting the bank, you're trusting your paper supplier, you're trusting the courier who transports it back and forth. If you've stored it on "your" servers you're still trusting your hard drive manufacturer, your RAID firmware, your OS code, your hosting company, your sysadmins.

For most values of "I have x hours and y dollars to safely store z gigabytes", a pure-cloud solution (possibly involving multiple independent cloud providers) has a lower chance of failure than one involving local storage.


>For truly essential data at least one of those copies is both offline and offsite.

But the fact remains that where ever you choose to put it "offline and offsite" the odds of it being lost are orders of magnitude greater then its persistent and redundant storage on a reputable cloud provider. Even if you put it on the most stable storage you can find and lock it in a underground safe. You can't guarantee its integrity a handful of years from now let alone centuries.

To put it in other words you are advocating storing your money in a mattress because you don't "trust the banks"


You're entirely free to play fast and loose with your customers data.

It is interesting that even today there are still people arguing against having back-ups because 'the cloud', you'd think that we ITers learn from our mistakes but for some this has to be personal experience before the lessons are learned. Good luck if and when it happens to you, please re-read your comments here at that time. There isn't an IT person that I know that has not been saved by a back-up at some point in their career, the fact that you are lucky so far is not a reason to think you are exempt.

And no, nobody is advocating storing your money in a mattress because you 'don't trust the banks', the point is that you can keep a copy of your critical data for very little money to safeguard against an eventuality that seems to hit the IT world with alarming regularity, even if they use 'the cloud'. Fuck-ups, hacks, disgruntled employees and failures all happen, every day.


Maybe ask the people of Greece if they "trust the banks". Shit happens, a local copy of your stuff does no harm at all in the same way as some cash on the hip is cool in case your card stops working.


Do you keep your money in a shoe box under your bed or stuffed in your mattress? I'd hope no, but then that begs the question "where do you keep it?" I'd assume not in a bank because any bank can fail at any moment and they are only insured by an instutition as flaky at the United States Treasury department. Governments collapse all the time just ask the Soviet Union. Right?

The failure probabilities are relative. And everyone here is equivocating. As if the likelihood of failure is evenly dispersed. You indicate that storing customer data in the cloud is "playing fast and loose" but I'd argue the opposite. Not having a cloud backup is what is "fast and loose"

Imagine you are an independent bank. You need to move your customers deposits. Do you load sacks of cash into your corporate minivan or hire an armored car service? Well lets look closer. With the latter you are giving up control, right? You have no control over the quality measures an armored car service might take. Yet somehow not contracting them seems like foolishness. The reason is obvious. Its because you know in this case that transporting money isn't your expertise. Its not something you can focus adequate time and resources on perfecting. You also can't spread the risk of failure across a lot of customers, absorb that failure, and make your service better for the next go. The exact same logic applies to long term storage of data. If that isn't your only function its extremely hard to get it right.


I'd assume not in a bank because any bank can fail at any moment and they are only insured by an instutition as flaky at the United States Treasury department. Governments collapse all the time just ask the Soviet Union. Right?

If you bank fails, you will be made whole with money from the FDIC. Money is fungible. Any money will do.

Data is not similarly fungible. There's no IT FDIC to replace your lost data with other data that's identical.


Oh you can't trust the FDIC they are government insured. And governments collapse all the time. I gave an example. Like a I said there is only one safe place for your money. Thats in your own private bank that you build and maintain yourself.


When is the last time the FDIC failed versus the last time an online storage vendor folded?


Its true that FDIC has lost fewer deposits than S3 has lost objects, but as usual there is a caveat. There are 2,000,000,000,000 objects in S3. In order to make a comparison the FDIC would have to insure over 300 deposits for every man, woman, and child on the planet and maintain a failure rate of '0.000000001%' or The FDIC would have to operate in its current state continuously for many millennia without interruption. I don't get a sense that the contrarians here appreciate the numbers involved.


You're evading.

When was the last time the FDIC failed versus the last time an online storage vendor folded?


Anybody who remembers nirvanix is going to have the willies just reading this thread, those who were caught by it will see red.

Personally I think the risk that AWS fails is remote, but if I were to store a bunch of data with them I'd most definitely make sure that I would not be dependent on them. It would be a convenience at best, but never a dependency.


I keep my money spread out across multiple accounts with multiple banks up to the insured limit. Because yes, banks can fail.

> You indicate that storing customer data in the cloud is "playing fast and loose" but I'd argue the opposite. Not having a cloud backup is what is "fast and loose"

Basic reading comprehension failure, that is not what GP is arguing.

He's arguing that if your live data lives in the cloud your backup copy should not be in the cloud and vice versa.


>I keep my money spread out across multiple accounts with multiple banks up to the insured limit.

Insured by who? Not governments. Governments collapse all the time. The only safe place to keep it is in your own custom bank at home that you built yourself.


And that money is worthless if the issuing government collapses. There is no ultimate security.


> To put it in other words you are advocating storing your money in a mattress because you don't "trust the banks"

Money and data are not directly compatible here, as one "bit" of money is the same as any other in most respects, but I do always carry a minimum amount of cash in case my cards stop working or are lost/stolen, have a pot of "emergency cash" at home for similar reasons, and recommend others do the same. Most of my money is in the bank so I'm not protecting against massive institutional failure here, but I am protecting myself against a variety of possible temporary inconveniences and possible mistakes on my part.

I'd rather have two backups of which one is more likely to fail then the other than just one, or three rather than two, and mixing types of backups means not all of them are subject to exactly the same failure modes (my local backups, online or off, are unaffected by connectivity issues for instance). For data that is important, a little paranoia is healthy IMO.


> To put it in other words you are advocating storing your money in a mattress because you don't "trust the banks"

In this case, money can't be compared to digital data. You can make multiple copies of digital data on our own but not of your money tucked away into a mattress, that would be illegal.


It doesn't share the exact same properties, but you can make the comparison with any given copy. So yes the analogy holds.


It is not alarmism at all. All it takes is a single pissed off employee or a single fat fingered command and an entire business can be wiped out. Coming from the database world it the saying was there are people who are crazy about backups and there are people who have not needed their backups...yet. For some reason everyone ends up having to learn the hard way.

I also not advocating against the cloud, just that a single copy in S3 is not a backup solution.


In all fairness, you need more than just off-cloud backups to guard against a single pissed off employee (make sure you're locking S3 and your local backups with different credentials, and the credentials to read the backups can't be gleaned from the credentials that write them) or a single fat-fingered command (kingdomofloathing.com once blew away N days of user progress by accidentally running a backup script in the "restore" direction against prod data).


Of course! Offline backups are only one of the many things that need to be done. S3 versioning can help with fat finger errors, credential management helps with rogue employees, bucket policies can copy files to off to private buckets/glacier, etc...

There is no single perfect solution, so like security you layer your disaster recovery.


> make sure you're locking S3 and your local backups with different credentials, and the credentials to read the backups can't be gleaned from the credentials that write them

This is more or less an exact copy of how I would advise companies to set this up.


S3 seems great today, and is undoubtedly more technically reliable than any home solution, but history is littered with companies that closed down on very short notice leaving their users' data deleted or inaccessible. Not to mention that billing disputes or legal action could also affect your ability to get data from them.

(Jason Scott is speaking from experience here, as one of the people called in to do emergency archive response when these cloud businesses shut down.)

Edit: jacquesm points out the sudden death of cloudspaces.com, which I'd forgotten about: http://www.infoworld.com/article/2608076/data-center/murder-...

A single computer security problem (which happen on a daily basis, although not usually at this severity) could enable your cloud to be deleted.

The "counterparty risk" is small but ineradicable. The finance industry re-learnt this with Bear Stearns recently.


> history is littered with companies that closed down on very short notice leaving their users' data deleted or inaccessible

That's absurdly unlikely to happen with S3.


How likely was it that one plane would fly into the World Trade Center, let alone two?

I'm not going to go too deeply into speculative scenarios, but all kinds of business-interrupting calamities are possible with very low probability. We just had the story on the frontpage about the Juniper backdoor; do you think S3 isn't being targeted by multiple state intelligence agencies?

Basically I agree with https://news.ycombinator.com/item?id=10772321 - by all means keep the working copy online, but have a local offline backup.


Pull out the more general example. Unexpected shit happens. A bad code push deletes data, a disgruntled employee decides to leave a "present" after they leave, etc. All things that have happened before. Having a backup away from production makes it much more easy to recover from unplanned outages.


"That's absurdly unlikely to happen with S3."

How absurd was it that money market funds would fall below $1 during the recent unpleasantness ? It was so unthinkable that major parts of our entire global economy were predicated upon it never happening.

It happened.

Money markets are even more "serious business" than Amazon, or even all of tech is, and had even more big brains attesting to their inability to fail. They failed.


Right. If that happens, I likely have MUCH bigger problems to worry about, e.g., finding the nearest VAULT-TEC vault.


Yeah but once again this problem is statistical in nature. Admittedly the cloud isn't perfect but the question remains. Does the "Counter Party" risk outweigh the chances that my home spun solution would fail? Since the likelihood of failure on my part is orders of magnitude higher then the "counter party" risk would have to be huge, like really enormous. e.g Amazon would constantly have to be on the brink of imminent collapse. This is not the case.

If Amazon went out of business there is no chance at this point that the loss would be total and catastrophic. Its essentially a bank at this point. Its shutdown would be orderly.


The idea here, and you seem to miss this point entirely is that for your company to fail then both systems would have to fail simultaneously. That is why you have back-ups. Any one system can fail, but for two systems (both the original and the back-up) to fail catastrophically at the same time is statistically very unlikely, and with a back-up here I am just talking about a secondary system that just pulls the data in on a regular basis. Tarsnap or a simple script that stores the data in a set of directories, anything is better than nothing if you lose all your data. Rebuilding will still take time, but at least there is something to rebuild with.

For any one system to fail is perfectly possible and in fact should be expected to happen at some point and that's why you design against that.

You also make sure that it is not the same people that have access to both systems so that if your sysadmin walks onto the floor with a bad hairday your back-ups will still be there. And this is also why you test your back-ups to make sure they actually work.


And what you keep missing the point on is that Im not saying don't have a personal copy. I am saying if you have multiple copies the safest one is undoubtedly in the cloud and by no small margin.


Bank shutdowns are not orderly because banks are important; they're orderly because banks are heavily regulated. If Amazon has an orderly shutdown, it would be because of the selflessness of its owners and managers, which is a pretty fickle thing to bet on.


I certainly think this level of regulation is on the horizon given the number of businesses who use S3 as an authoritative repository. If Amazon needed to shutdown S3 even today I can see the government asserting a heavy hand in it.


You need to think strategically about what you put on the cloud, because it's easy to overdo it. The decision inputs for an individual, startup and enterprise are all different.

The cost of getting stuff out of Amazon is very dear. While the statistical likelihood that Amazon will lose your stuff is low, the likelihood of Amazon or a competitor doing something to cause you to rethink your use of AWS (or force you to move) is much higher. Wanting to reduce your AWS costs in the long term is a near certainty.

As vendors move away from perpetually licensed software this gets more important. What happens when a vendor (say Microsoft for strawman purposes) decides that some function that is important to you must interface with Azure, and Azure only? What if Amazon decides upstream network transfers are no longer free? Those sorts of changes can break your business, and vendors like Amazon/Microsoft/Google are fully enabled to change those terms.


"TL;DR I hear this argument all the time. The cloud isn't perfect but its a hell of a lot closer to anything I could achieve. "not invented here" syndrome won't save your data."

I think there is an easy rebuttal that should be considered...

First, the "nines" rating of any service or resiliency is just gibberish. Go find the statistical likelihood of money market funds "breaking the buck" or of CDS blowing up - both in 2007/2008. Those had a lot of nines too and a lot of very smart , well qualified people attesting to those nines (in venues even more serious than IT).

A highly complex system becomes incomprehensible, even to the people that built it. Those nines mean nothing.

Second, you absolutely can build something more stable and predictable than Amazon precisely because you're the one that built it - which means that it is more comprehensible and fails more predictably and gracefully.

I don't care who does the calculation and how many nines they come up with - if you load FreeBSD on two bare metal servers and put them in two different datacenters and run them with any kind of conservative and cautious sysadminning you'll have a better solution. Yes, it will be more expensive.[1][2]

The standard closure to a comment like this is to refer to Talebs Black Swan and Antifragile books ... which you certainly should read ... but even more important is "Normal Accidents" by Charles Perrow[3] which I hope will convince you to stop looking for complex things that never fail, and instead look for simple things that fail gracefully.

[1] ... but we have a HN-Readers discount - just ask!

[2] You know who we are.

[3] https://en.wikipedia.org/wiki/Normal_Accidents


> I don't care who does the calculation and how many nines they come up with - if you load FreeBSD on two bare metal servers and put them in two different datacenters and run them with any kind of conservative and cautious sysadminning you'll have a better solution. Yes, it will be more expensive.[1][2]

Nonsense. It's very easy to have that kind of setup fail - remove one from the load balancer, take it down for maintenance, whoops we removed the wrong one from the load balancer. I've been in similar-sized companies using dedicated servers or AWS, so I've seen both sides. It's like how people feel safer when driving themselves than when being driven by a professional, even though a professional is overwhelmingly likely to be a better driver - everyone thinks "oh, there's no way I'd make that kind of mistake".

It is very easy to overengineer "high availability" systems - I certainly think things like STONITH and dedicated control planes are more trouble than they're worth, and running a system that you don't understand is a recipe for failure. But I'll take FreeBSD on a basic AWS setup - four EC2 nodes (multiple AZs), and an ELB - over bare metal any day. Physical server maintenance is not where I have a competitive advantage, and would mean more complexity to understand, not less.


"Nonsense. It's very easy to have that kind of setup fail - remove one from the load balancer"

I'm sorry - you're already missing the point.

There is no load balancer. There's no firewall. There's no services running except for sshd.

I guess I should qualify what I mean by "better", though. What I mean is, "I know exactly how this will fail and it won't be interesting or surprising. Or take any thought or time to fix."

It will be comprehensible. "FreeBSD on a basic AWS setup - four EC2 nodes (multiple AZs), and an ELB" is not. You have no idea (nor do I, nor do probably most folks at Amazon) the different and fascinating ways that will fail.

(disclosure: we use EC2 instances for backup DNS. We are not anti cloud or anti amazon)


If there's no load balancer then is the second machine just a warm spare? Do you get paged to get up and reconfigure things when the first one fails in the middle of the night?


>Second, you absolutely can build something more stable and predictable than Amazon precisely because you're the one that built it - which means that it is more comprehensible and fails more predictably and gracefully.

This is from you and this is from the wikipedia article on NIH(not invented here) syndrome ...

>Not invented here (NIH) is the philosophical principle of not using third party solutions to a problem because of their external origins. False pride often drives an enterprise to use less-than-perfect invention in order to save face by ignoring, boycotting, or otherwise refusing to use or incorporate obviously superior solutions by others.

I am not saying don't have a copy, I am saying the if you have several copies the safe one is in the cloud.


What is your recourse if service level is not met? A monthly credit? Your data is probably worth more to you than the value of its hosting. Too many people rest on the laurels of SLA.


I argue frequently that the cloud causes cognitive dissonance. People want the ease of use and reliability of centralized outsourced services, but other people want data privacy guarantee and control of those services. If anything is a threat to one or the other, it's the fact we fail to understand the various requirements from distinctly different use cases.

These polarized "requirements" are usually rationalized away by each group. As a data privacy advocate, I believe that I can provide a reliable storage solution to my company without using a centralized cloud service, which then guarantees my privacy because "I'm in control". As an advocate of centralized cloud services, such as Amazon, I present that their team is better at security and reliability than any other team on the planet and that I can encrypt something and trust that my key management is secure. Both of these arguments have fallacies and assumptions.

The solution is to challenge ourselves to build better solutions. At what point in time did technology advancement ever slow down? At what point will we ever stop and say "Y'all, this here compute system is good enough and should be centralized/decentralized!"? Never, I say.


I think you missed the point. S3 isn't free. You pay for that service:

    and if you’re a person they are giving it to you without
    you signing anything accompanied by cash or payment that
    says “and I mean it“.


Are you talking to me? I never said it was free. No storage solution is free.


He's saying that the article does not apply for S3 and other paid services.


Yeah I guess I can see that. But then again I would never give important data to a 3rd party without some sort of contractual arrangement.


How will a contract save your ass if your data is lost? You can't outsource your responsibility.


Unless you're personally memorizing all the data then you're outsourcing the responsibility somewhere - either you're paying for a service that you hope will be compliant with its specification, or you're paying for hardware that you hope will be compliant with its specification and employees that you hope will be as skilled as they claim to be. There's no way to eliminate the risk entirely, all you can do is take the safest option available. These days that's often the cloud.


> These days that's often the cloud.

For plenty of use cases I actually agree with that conclusion. With the caveat that I would add: accompanied by a suitable non-cloud back-up of critical data, code and configuration data held on a medium that is not accessible by the same people that administer the cloud stuff.


I think for most levels of safety you would target, a pure cloud solution is going to be cheaper. E.g. if you decide you need a backup that's independent of your primary cloud provider and any staff with access to that, you can probably accomplish that more cheaply with a second cloud provider than by self-hosting.


Yes, a second cloud provider is a valid option, if you take a number of precautions (see elsewhere in this thread).


What is it with you?Will all your home brew nonsense save your ass when your data is lost. I am basing all of my arguments on "what is the statistically most likely to fail" and "given limited resources where would you store your authoritative copy". The answers to these are "whatever you come up with will have a higher likelihood of failure" and "the authoritative copy goes into the cloud" You must be a terrible gambler. "Will playing the odds save your ass if you lose your money on good bet" no. But making good bets will increase your return regardless of the outcome of any given bet.


> "whatever you come up with will have a higher likelihood of failure"

This is where you are simply wrong.

> "the authoritative copy goes into the cloud"

This we agree on.

So, where you are wrong is this: you are only considering technical failure modes but there are many others besides and so any comparison between 'the cloud' and 'your get-out-jail-free-card-backup' would have to be in terms taking into account all modes of failure, not just the technical ones and then on top of that you'd have to consider the likelihood of the backup failing to restore at the same time that the primary system goes down for whatever reason (including technical ones).

That's why I call it a blind spot, even after being told in 20 different ways that the uptime of Amazon does not come into play here you are still clinging to that. It is simply not relevant because that's not the scenario a back-up will most likely prevent against. It will also prevent against that scenario, on top of the ones where someone incapacitated by drugs, anger, depression or any one of another set of circumstances decides to take it out on your company. Or maybe someone makes an honest mistake (I've been called out twice in my career to restore data for companies that had been wiped out because of simple mistakes that should have never ever happened).

So back-ups are not a luxury, they are a necessity and the degree to which Amazon can outperform your back-up in terms of availability is not a factor in the whole discussion.


Well, unless Amazon blocks your access to your account. Have you accounted for that in your failure rate calculations?

Ask Bernie Sanders about the cloud and the implications of losing access to it at the whim of the cloud provider.


We've been doing it for a long time. It's called some reliable servers in multiple locations with copies of the same data plus some offline backups (esp tape). There's many non-technical, Fortune 500 companies whose data has lasted longer than Amazon's existence. Cheapest version of this involves two high-capacity boxes at two locations each with maybe a colo agreement. Let's you keep replacing stuff as servers and drives fail. I've even done it with embedded boxes (VIA Artigo's) in the past where data volume wasn't high.


Do you control Amazon? No? Then you're not the captain - you're just a sailor and someone else is in control of your data.

Now go put your stuff up on S3 but please keep a backup of the things you upload there.


I do keep a back up of things, but the back up is on S3 because that is the most stable storage option available to me. Everything else is the risky copy.

>Do you control Amazon? No? Then you're not the captain

This is textbook NIH syndrome. Rather than looking at the relative probabilities you look at something from an ideological perspective. Your argument isn't "Its safer with me" its "I want to be in control of it" This attitude is generally harmful. Think about your money. Are you in control of the bank? no right, but you still keep your money there instead of in a shoe box under your bed. Why? Because the bank is better at protecting your money than you are. They have big heavy metal doors and men with guns to move it from place to place. Amazon is like a bank for your data.


> This is textbook NIH syndrome.

I'm not saying you shouldn't use S3. You should just make sure that it's not the only service you have the only copy of your data hosted.

>Rather than looking at the relative probabilities you look at something from an ideological perspective.

No, I'm looking here solely at the risk of putting all your eggs into a basket you don't control.


This is the perfect point of the article - use the cloud (hey, it's "free") but KEEP A COPY. Otherwise you will get boned one day!


Might as well just change "use the cloud" to "use a computer".


What if you make a mistake a manually delete something in the cloud? What if a program has a bug in it and something gets deleted? If your data is precious, you need multiple copies of it, no matter where they live.


The problem with the "once in "thousands of years"." argument is that you never know when that once happens and it may well be... tomorrow and never more in the next couple of thousands of years.

That is what happened (once again) to John Meriwether and Long-Term Capital Management... A "once every 10.000 years event" which took place... today.


> Amazon's S3 offers service level agreements with failure rates that at one point implied the statistical likelihood of losing an object to be once in "thousands of years".

Where things like that in 2009 when this article was written?


> I don't understand this logic.

Curious if Upton Sinclair would have anything to do with it.

Lets put it this way, I know enough not to store data on the cloud(s) exclusively.


Well, it's seemingly trivially easy.

Don't move to the cloud; copy to the cloud.

Also applies to other storage media.


AWS customer since 2007. Just pulled the plug on our last EC2 instance this week having migrated our stuff to a provider offering root servers for the cost of m3.medium. Our requirements are simple and we have no need for high-load/high-end layers. We used to have 50-100 VMs depending on the time of the day, now less than 20 with the rest of the workloads migrated to Docker containers.

P.S. Can't delete Glacier Vaults for now as AWS enforces a cooling period.


> P.S. Can't delete Glacier Vaults for now as AWS enforces a cooling period.

That's a good thing. And in the case of Glaciers an excellent pun.


Which provider did you go to?


Well, that's one way to break the ice :)

I have a bit more of a nuanced view on this than Jason, but I totally understand where he's coming from and when the whole cloud gravy train started rolling our perspectives overlapped much more than they do today (and quite probably since then Jason's perspective has changed as well as perspectives do with the passing of time).

There are use-cases where the cloud is absolutely and utterly the wrong way to go about it. When you're running a bank, a government institution (ever a lower government one) or something else that is mission critical and where total control of the data and maintaining end-user privacy is paramount then the cloud is probably not the right solution.

There are also use-cases where the cloud is the right solution in principle but the wrong solution in practice because of cost. Above a certain scale bandwidth and storage costs of cloud operators will always command a premium over those you get from dedicated hosting providers.

As for 'not owning the machines', plenty of companies lease their servers, so technically they don't own them anyway.

The big problem with 'the cloud' as I see it is that companies tend to rely utterly on it and do not have a 'what if the cloud fails' line in their disaster recovery plans. Lose the cloud data and the company goes up in a puff of water vapor, which is what clouds are made of after all.

So if your use case does match the cloud solutions well then make sure that whatever else you do, have at least a copy of your critical data, code and your configuration information outside of the cloud provider. And while you're at it, make sure that this is done in such a way that there is a separation of duties with respect to those that can administer the cloud portion and those that can access those just-in-case-the-shit-hits-the-fan backups.

Just so you don't end up like codespaces did.

Finally, the cloud is not so much an end-station as it is a step on a much wider scale from absolute control with certain administrative duties on one end and much less control but great convenience on the other. Where on that scale constraints indicated by your comfort level, your application and your fiduciary duties allow you to pick your solution is something that is likely different for every company (and likely for every person).

Customers of companies would do well to research their service providers when it comes to how they are architected, just in case something goes drastically wrong so they don't end up holding the bag.


I'm definitely going to start renting my own server somewhere in Europe starting from January. I absolutely agree with everything he said and I really want to claim my own data again (run my own email server and things).


Do you have a write up on running your own email server?


This was the article that convinced me to try: http://www.27months.com/2013/10/its-always-sunny-in-iceland-...

I read both that one and the one posted here like an hour before this one got posted on HN and they have convinced me to give running my own server a try.

In the meantime, I have just downloaded a backup of all of my data from Twitter and Facebook (Facebook's archive was like 15x as big as the Twitter's archive even though I'm using Twitter way more) that I am going to save on my server, I have switched to POP instead of IMAP on my current email service and I am testing out ownCloud in a Docker container.


If you use Snapchat a lot, you may notice how often you get updates from people or see their public story change. Do you ever stop to think about the old snaps, or miss them? No, because you have a constant stream of new ones. You can always make more memories.

Nothing about the cloud is that different from what we had before. With shared hosting providers, you and 50 other users would fill up your disk quota on one or two hard drives on some dinky 1U server running Apache and ProFTPD. If the drives died, along with it went your data. Which is why you kept a copy on your own computer. Back then, nobody expected anyone to keep their data for them, so they just kept their own backups. The same was true for managed services and colo with the exception that you had to do more of the work yourself.

Because the industry has gotten better about preventing data loss, we get complacent and stop saving our stuff as much. But why piss and moan over more reliable, more massive services for cheap or free? Because it isn't perfect, or innovative, or more transparent?

The status quo of the industry is to reinvent the wheel, so it's hard to get mad at people for re-packaging the same solution in a different container. The obsession of holding onto all your old stuff just makes this look even more unnecessary.


Salutations, ass-end of the Tech Elite.

As someone who has generated a pretty hefty sandbag of verbiage over my decades online, it's always amusing to see what the Grand Eye of internet arbitration decides is an incredibly important and pertinent subject to discuss in my back catalog. Whether it's my work in guiding volunteers for in-browser emulation (http://archive.org/details/softwarelibrary), my delightful coterie of 1980s BBS textfiles (http://www.textfiles.com) or perhaps my documentaries on BBS culture (http://www.bbsdocumentary.com) Text Adventures (http://www.getlamp.com) or the DEFCON Hacker Conference (https://www.youtube.com/watch?v=rVwaIe6CiHw) ... or, as it is today, one of my many long-form written-down thoughts on all manner of this silly medium many of us have chosen to live our lives.

Oh yes, also that my cat is on twitter and has a million followers. (http://www.twitter.com/sockington) - Lots of people are loaded with knapsacks of opinion about that one as well.

I have found that Hacker News (which is, be clear, an unexpectedly lively extension of Y Combinator) is composed of several diverse groups, all with variant approaches to a linked subject. A linked subject which, as some have pointed out, I wrote 6 years ago, deep in the mists of time.

One group is literally in it for the Money, the gain, the ROI, the endless quest for the "Unicorn", and all their commentary is pungent with the bias and filter of either finding the precious gold coin at the bottom of the shitpile, or are rife with attempts to promote or play up subjects and links of great interest to their financial agenda. Be assured that I could not care less about the current status of the beating of your heart.

Another group seems to be happy to drill down as deep as they can into the mathematics, algorithms, and code of a situation, thinking that if they napkin-blart out enough "facts", they will win some sort of day. I find these people tend to be unhappy about flowery language or effusive phrasing, simply because they've left-brain-dominated themselves into deep pits of nut-sorting and bolt-counting. They use "TL;DR" a lot, as well as, I assume, Adderall. Their heartbeat status is of greater interest to me, if only because I think they are coming from a good place, even if that place smells of Cheetos and sweat.

And, of course, there are Opinion Tourists, my favorite, who might as well be equated with a loud and cantankerous pit of waving hands, waiting for the newly linked (if not newly written) event/opinion/image for their to raise in a mighty roar with a hastily cooked "hot take" on the item. Some of them even optimize the process to not even click on the provided link before the horn honking ensues.

So, "Fuck the Cloud" was written in the deep miasma of when everyone used the term "Cloud" interchangeably with "Magic"; that it was an approach and glory that would lead the experience of computing to a new shangrila. Like any old-timers rife with memories of how we got into that world (and of the echoes of cloud-dom going back 50 years), I decided to write out some of my own thoughts, especially on this attempt to dumb down the populace and separate them from not just responsibility, but control and agency with their data. I have been entirely correct in the general theme - there is a divide within the technical community, of people with admin access and the ability to control any aspect of their work, and then a very large, almost overwhelming set of users who are, essentially, meat stock. And in the same way that meat stock has no particular seat at the table when negotiations of an agricultural nature are conducted, so in the same way are the "users" left out in the cold as a whole range of abilities and ersatz "rights" are stripped away, under the guise of "ease of use" and "leave it to us".

All of this was written without the revelations of the deep, intense surveillance apparatus that is now in place, ensuring that any of this data you control or thought was within your own private space is actually destined to meet you again in an investigation, a courtroom, a warrantless intrusion or a physical SWAT attack. That wasn't even the point.

The point was that user data, treated as something to abuse, monetize, and ultimately discard as a whim, was a complete betrayal of the early promises and experimentation of the Internet. To counteract this trend, I co-founded Archive Team (http://www.archiveteam.org) and our delightful success in many areas would warrant a completely different essay itself - and it has, along with myriad speeches and presentations in the years hence.

I'm sure it might be delightful entertainment for Hacker News to find this or that out on the net and go off, endlessly, in the loop of "This Needs Me" and "Fuck You For Thinking That", but ultimately, these are ridiculous showboat-dances of "what if" and "why not", and I've discovered in the years hence that truly, actions and achievement s speak louder, ever so louder, than words.

Enjoy your day.

And fuck "The Cloud".


Thanks for this.


I could have done without your insulting tone, no matter how proud of your opinions you are.


You are an adult. If you miss the message for the messenger you've got no one to blame but yourself.


So is textfiles an adult. If he/she is a jerk, he/she has nobody else to blame.


I'm not sure how this is relevant to what I said. While it is certainly the case that one can catch more flies with honey, I think we'd all rather be people who digest the message instead of disregarding it because we don't like how it was delivered to us. This isn't about textfile's delivery, but that it's useless bordering on childish to take a response from an author and respond simply with "I don't like your tone."


> I think we'd all rather be people who digest the message instead of disregarding it because we don't like how it was delivered to us.

That would be good, yes. We should all try to be like that. However, HN has guidelines as well, and I'm pretty sure that textfile's post violated some of them. And the way HN maintains it's status as the kind of place where we care about the message is partly by discouraging statements that needlessly distract with an offensive tone.


Please correct me if I'm wrong, but the guidelines only seem to state:

> Be civil. Don't say things you wouldn't say in a face-to-face conversation. Avoid gratuitous negativity.

I guess this is the point where we'd be arguing about taste, but it seems to me that while textfile's comment was clearly negative, it was well constructed and well thought out. Not to mention it was in the _exact_ same tone as the article he'd written years ago that the conversation was about. In this case we should be very careful not to immediately jump on people being upset or negative. It's a powerful tool that, in this case, is being used wisely. I think a bruised sensibility here and there is a worthwhile risk in the name of maintaining a network like HN that accepts dissent and spirited disagreement.

As a slight tangent, in reading the guidelines I found that more than an attempt to tone-police, they are an attempt to lower the noise:signal ratio. The reason I feel that this discussion is important is because it's the reply, not textfile's response, that adds noise to the discussion even if you feel targeted by the authors response.


There is no alternative approach presented here. I can only assume that the author has never had to scale a piece of software to server hundreds of thousands of concurrent users.


The implied alternative is "something you run, control, buy, administrate, and understand." Don't overthink this. You need it to scale to one (1) concurrent user.

Go buy an external hard drive. Start saving important things locally, and also automate backups to the external drive. You now have two copies of everything you can't afford to lose.


I like the guy's writing style and use of English.


In many ways the cloud is one step forward, two steps back. I look forward to the day when I can use my phone as my "cloud".


Getting resource limit reached error


http://mxtoolbox.com/SuperTool.aspx?action=ptr%3a204.109.60....

I bet that wouldn't happen if he'd host his blog on a scaling cloud provider with a proven track record. I could think of a few that might be good candidates... ;)


This is an old article and is almost entirely op-ed. No idea how this got to the top of page on HN. :/


Maybe because opinions are also valid intellectually?

Mere data driven conclusions get nowhere without a point of view and an end goal to accompany them.


These statements are true, but may not be reflective of why this story gets top billing on Hacker News. See techdragon's extrapolation comment that's a sibling to this one. ;)

Place smells more and more like Slashdot every month.


Because we all still harbour resentment toward frustrating situations such as the limitations of "the cloud"?


It's rather interesting that HN mods allow posts with this kind of language but the moment you are even mildly critical of a comment or commenter, you get a warning about language. I guess the thinking is we look the other way as long as its not hosted on the ycombinator domain.


I think it's more he (the article) is saying fuck to an idea, not a flesh and blood person (as when responding to a comment).

Also, it is possible to have civil debate around a profanity riddled article. That's what this site is trying to achieve.


What the fuck are you talking about?


In enterprises, you either trust the cloud, or your local data center people.

The local data center people will threaten and lord over you with their hardware powers unless you have the cloud alternative.

And you're not entirely wrong about the cloud.

So don't trust either.

A modern business should have at least two external cloud providers and a local option.


OldManYellsAtCloud.jpg


[flagged]


> you are a piece of shit

Personal attacks are not allowed here, regardless of how wrong you think someone is. This and other comments you've posted break the HN guidelines egregiously. We ban accounts that do this, so please stop doing this.

We detached this subthread from https://news.ycombinator.com/item?id=10772561 and marked it off-topic.


First off, I'm not your 'bro'. And what do you mean a DOX, you yourself wrote that on this very site. Or are you saying that you consider yourself anonymous from hereon in and therefore get to spout whatever nonsense you feel like and invite people to do dumb stuff 'because you said so' and we should ignore this in the context of your other postings?

I'm responding to you the way I do because you run (or apparently, ran) an AWS hosted service.

I'm a piece of shit because you disagree with me, but it just so happens that you've been all over this thread with a continuous stream of (hopefully) purposeful mis-understandings and/or downright trolling. If you want to do that from an anonymous account go ahead but don't do it from the same one that you use for 'Who is hiring' posts.

Just like say if cperciva writes about online storage I interpret that in the context of his tarsnap business.

People here are pretty open about the projects they're involved with and so are you, if you don't want that link to be made then I suggest you don't post about your company on HN or you create a separate account to share your 'pearls of wisdom' like you do in this thread.

Oh, and your anecdote of how many years you were up and running holds zero water by your own standards.

Attacking people and their professional reputations can get your own professional reputation called into question, get used to it.


Bro was a pejorative friend. To be clear I never attacked you. Not once. I called your arguments into question, but that is what we are doing here. I believe your reasoning to be specious.

>I'm a piece of shit because you disagree with me, but it just so happens that you've been all over this thread with a continuous stream of purposeful mis-understandings and/or downright trolling.

This is just not true. I called you that because you tried to do just that to me. A quick re-read of all my comments and I'm sure you'll see I never made any personal reference to you. Now if you consider me telling you that you are incorrect an attack then I'm sorry this is something you'll need to get used to. In this case because you are wrong.

I've offered you ample opportunity to proffer a statistical argument, but all you can give me is "if I didn't build its not safe" Sorry friend I just don't trust you. I've been doing this a long time and I don't trust myself.

I never told someone to do something "cause I said so" but then again you never address the meat of my position. "Cloud providers have much lower failure rates than you historically I trust them more than I do you when building my understanding of the risks". Its trivial to undermine my position. Show you have lower failure rates than a prime time storage provider(bonus points if its amazon).

Lets just a get a few non sequitars that you can't seem to get your head wrapped around. I never said don't have a back up. I said the back up in the cloud is the most durable one.

I never said amazon couldn't fail. I just said the chances they fail vs. you are orders of magnitude lower.

I never said you were bad at your job. I said you seem to have a shaky grasp on statistics and possible a fair amount of NIH syndrome.

You are correct my anecdote doesn't prove my point, but it provides anecdotal evidence to corroborate my statistical position.

I didn't say "don't be open or transparent" I said pulling a persons personal details into an internet argument crosses a very clear line.

Get over yourself and remember not everyone that disagrees with you is trying to undermine your career. But you step over a line with you get personal and start pulling personal details into an internet conversation.


> Bro was a pejorative friend. To be clear I never attacked you. Not once.

I find this especially hilarious. "I just insulted you! I never insulted you!"

(I preemptively decline to get in an extended semantic argument about whether insulting someone is an "attack".)


You're stuck in the groove of statistics. You will need to look further - and maybe gain a bit more experience in IT - to understand that PEOPLE and not the reliability of hard-drives or the uptime of cloud services are what make back-ups a vital necessity of any IT business.


Stuck in the groove of statistics? When talking about risk managment? Where is my jackie chan rage face image.http://mylolface.com/assets/faces/misc-jackie-chan.jpg

Gain more experience in IT? I mean i've been at it professionally for 18 years soooo I guess it depends on what you consider "a lot of experience"

I get what you are saying. "A disgruntled amazon employee could just delete the world's data" right? Except for thats not the case. Its all stored in triplicate(at a minimum) across a vast number of independently available data centers. The durability guarantees are hard to fathom . Honestly I bet they'd have a hard time deleting data permanently even if they wanted to.

But once against I'm not advocating that you only store your data in one place. I'm just saying dollars to donuts you stored at least one copy of your data on s3, catastrophe has struck and only one copy of your data is left where do you think it is?

I know where I am placing my bets.


> Gain more experience in IT? I mean i've been at it professionally for 18 years soooo I guess it depends on what you consider "a lot of experience"

Experience is not measured in years but in what you learned in those years.

> I get what you are saying. "A disgruntled amazon employee could just delete the world's data" right?

No, that's not what I was saying. It is absolutely incredible but you again manage to mis-interpret what I wrote. Really, how hard can this be, let me try once again:

An employee of a customer of Amazon could wipe the company data.

So if you 'x' found a company 'y' that employs an employee ('z') then 'z' can if given the right credentials do a ton of harm to your company. Note that the amazon employees are not even in this equation (they probably should be but they are a lesser threat assuming Amazon is set up properly).

> Except for thats not the case. Its all stored in triplicate(at a minimum) across a vast number of independently available data centers.

You really should read up on codespaces.

> The durability guarantees are hard to fathom .

Yes, except when your data is gone. Then the durability guarantees matter not one single bit. Amazon can not protect against you or one of your employees wiping the data purposefully.

Directly from the Amazon docs:

"When an object is deleted from Amazon S3, removal of the mapping from the public name to the object starts immediately, and is generally processed across the distributed system within several seconds. Once the mapping is removed, there is no external access to the deleted object. That storage area is then made available only for write operations and the data is overwritten by newly stored data."

> Honestly I bet they'd have a hard time deleting data permanently even if they wanted to.

Apparently, you're wrong about that. And it is quite logical from Amazons perspective that you are wrong about that. Otherwise how would their billing ever function. If you delete something there may be a very short period during which Amazon might be able to recover it if you asked nicely however I wouldn't count and that depending on how important the data is you're playing Russian Roulette here. And if you're so sure the data can be recovered how come you directly contradict Amazon documentation on that very subject?

Glacier is another matter by the way, at least there you'll be writing some code to delete a vault.

> But once against I'm not advocating that you only store your data in one place. I'm just saying dollars to donuts you stored at least one copy of your data on s3, catastrophe has struck and only one copy of your data is left where do you think it is?

Yes, well, several instances that prove you wrong exist. Your next hire might prove you personally wrong.

Good to see you at least have a copy elsewhere, and nice to see you consider at least a possibility besides the technical ones.

> I know where I am placing my bets.

You're welcome to place your own bets any way you want. But the company you work for and the companies you found had better have a 'oh shit we lost all our Amazon data' recovery plan in the vault and it had better be one that when tested holds water. Otherwise you too may one day start looking to outsource your troubles.

Mind you, I don't actually have a problem with people that act in this way, they are more than happy to pay me my exorbitant fees when it comes to saving their hospital or company or whatever institution it is this week that manages to intersect paths with something they considered absolutely impossible right up until the moment that it happened.

But you at least will never ever be able to say you weren't warned.


[flagged]


We detached this subthread from https://news.ycombinator.com/item?id=10772408 and marked it off-topic.


Thanks.


> And I want to be clear your moded and nonsensical jabbering is not why you are a piece of shit.

Ah ok, that's what caused it. Well, that's fine but there is probably an aggregate several 100 years worth of experience in this thread that seems to be in violent agreement on you not really understanding the matter under consideration.

> An attempt to call attention to my potential customers in a forum because you disagree with me is why you are a piece of shit.

If you present yourself as a representative or founder of a company here then by extension you represent the views of that company unless otherwise noted. If you don't want that then you are free to make another account which is not in any way associated with your professional HN profile, it wasn't me that made the choice to join the two, you did (as do many other people here). But most of us are careful to speak in such a way that we do not bring our partners or employers trouble by drawing attention to either our lack of knowledge or ability or by making statements that would - when viewed in the light of our professional engagements - show our employers in a bad light.

So, whether or not you agree with me is besides the point. If you feel like totally mis-interpreting Jasons writings and on top of that you call me 'alarmist' you are actively attacking our professional reputations, and what goes around comes around.


You are misunderstanding me( I know sit seems to be a common pattern for you) whether I stand behind what I say is totally separate concept from your motivations and how they reflect on you as a person. The quality of my technical reputation is not in question here and my employer is welcome to read this thread.

I didn't call you an "alarmist" I said worrying about whether amazon will stay in business, "get hacked", or if S3 is going to disappear without warning is "alarmism" They were responsible for 39% of all commercial internet transactions last year and store 2,000,000,000,000 objects. If they go under we are are all in a heap of trouble. A NAS device in some remote part of the netherlands isn't going to save us.


> You are misunderstanding me( I know sit seems to be a common pattern for you)

Then maybe you should write a bit more clearly. The opening line of this whole thread is you quoting a single line out of context, stating you disagree with it, then providing evidence that you probably should agree with it and that in fact you act contrary to your stated position. To me that makes no sense at all.

> whether I stand behind what I say is totally separate concept from your motivations and how they reflect on you as a person.

That's another thing I can't make sense of. Probably my fault.

> The quality of my technical reputation is not in question here and my employer is welcome to read this thread.

Excellent.

> I didn't call you an "alarmist" I said worrying about whether amazon will stay in business, "get hacked", or if S3 is going to disappear without warning is "alarmism"

I never suggested Amazon would go out of business, you made that up all by your lonesome. What I wrote is that your control panel could get hacked which is an entirely different thing.

I never suggested S3 would disappear.

I also never suggested that Amazon (the company) would get hacked.

Even so, there is a remote possibility that all of the above (which you just came up with) would become true at some point in the future. But I purposefully did not allude to any of those because the chances of those happening are remote enough that for me they don't count as reasons to have a back-up.

> They were responsible for 39% of all commercial internet transactions last year

Who cares.

> and store 2,000,000,000,000 objects.

Doesn't enter into the equation at all.

> If they go under we are are all in a heap of trouble.

Well, you probably will be.

> A NAS device in some remote part of the netherlands isn't going to save us.

NL is small enough that we don't really have remote parts. Besides, none of those situations are the ones that I wrote about in my original comment. You really have a hard time in the understanding department, first with the original posting, subsequently with my comment on yours and further on with several other people in this thread.

When you feel everybody is acting weird or seems to be unable to understand what you are saying: consider the problem is at least partially on your own end.


"We've put all our eggs in this one basket, which cannot possibly go wrong"


Oh, right, this is the guy that wrote the "facebook is the worst thing ever" screed.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: