I was always weary of Heroku because it's exactly what @hunvreus said: a black-box. Very appealing when you're strapped for resources in the beginning, but increasingly frustrating as you scale. Case in point: http://rapgenius.com/James-somers-herokus-ugly-secret-annota...
That being said, it seems like the model itself is not flawed, just the execution. I've met many developers, especially in the Node community, who fundamentally believe they as software engineers should not have to spend any time learning devops.
My position is that a good software engineer today should be familiar with the full stack from frontend (SPA, javascript, html, css) to backend (sql, basic db admin, db design) and devops. Obviously, you can't become an expert in everything, but the knowledge level required to become "proficient" in these areas seems pretty achievable if you're well-versed in software engineering fundamentals.
FWIW, it took me about 4 weeks to get a "startup-ready" fully open source 3-tier stack running in EC2 with full automation using Chef (which is a whole other topic) plus continuous deployment using Jenkins. Ultimately, we'll pay more for EC2 per month upfront, and I had to absorb the upfront time hit, but now that we're setup, it's extremely easy to tweak things, and we have complete transparency.
I'd like quote Adrian Cockcroft: "the undifferentiated heave lifting". However, the heavy lifting is the daily operation work, not the knowledge to scale your app, HA/DR, CD/CI, etc.
Compared with the black-box model of Heroku, I vote for infra-as-code+desired state, e.g. you order, we cook.
Disclosure: I am the founder of visualops.io, a white-box devops automation service for AWS.
@pswenson, I used Chef to automate setting up the server with Jenkins. This included automated security updates, awscli tools, nodejs, and even hubot so that we can trigger builds from HipChat (we still need to write a custom hubot listener to handle that, though).
I also took advantage of AWS IAM roles so that the server is pre-authenticated for certain S3 buckets.
PM me at josh dot padnick at gmail /period/ com with some background on what you're trying to do and I'll see what I can share. For many reasons, I can't make our Chef code repo public, but maybe I can share some code samples.
As far as online examples, yeah, the learning curve does kind of suck. Also, if you're starting from scratch, SaltStack may very well be a superior technology (don't know enough about it). I think the key with Chef is to learn how to read the docs, using the application-cookbook pattern (where you never touch the cookbooks you download online and instead customize them by "wrapping" them with your own custom-defined cookbooks.
It's also helpful to use very thinly designed roles, and define special cookbooks as your actual roles. This way you can version-control them.
I basically learned by reading the same material in multiple places, especially in books on Safari Books Online, and IRC was also a huge help b/c the community was VERY helpful.
I think Chef is a classic case of where the technology is somewhat inelegant and perhaps bloated, but a mature community is there and it's battle-tested so once you suffer the pain of ramp-up you get a huge benefit.
If our company was hosted on Heroku instead of AWS, we would have to shut down. Want a 50gb redis? $4k/mo. MongoHQ's largest heroku instance is 150gb at $2700 / month? We'll need at least 50. Want 300 workers? That's $20k / mo for less quality servers.
Heroku is great for working on your own project or if your data scale is small. Otherwise, the cost is just too high.
The amount of money you have to grow your business is directly proportional to the cost overheads of running your app minus the revenue it generates. We are assuming that the larger your internet service is the more money you will make.
A service that processes tweets from the firehose in realtime. Assuming you decide to process everything day 1, thats a case where the overhead from running your app could greatly outweigh the revenue it generates.
I mean, I guess? I would question that business and processing model heavily, though. Most sustainable business models show growing from some kind of smaller MVP to a larger full-featured application.
Significant overhead costs to go from nothing to steady-state instantaneously would be enough to scare me.
It's an interesting question as it was exactly a concern we had when we started to design the product we're currently working on (visualops.io). We realised that many people struggle hiring one (or more) ops people when they feel they're not big enough, although already too big to actually keep going without.
Usually, at that point, from our experience, we see two types of choices: 1- as you're suggesting, hire an actual ops person (but switching from a platform like Heroku to something like AWS where everything should be configured manually, and constantly updated, may require more than one person in some cases...) 2- keep on using platforms like Heroku (and get ripped off).
We have been ourselves in this situation when we have launched the first version of our product, and are now happy users of our own solution ;)
I don't think with the way technology is now, even without utilizing all the advanced services that AWS offers (so simply using the basics like ec2 / s3 / ebs / etc) you need a full full time ops person until you're paying 50k/month + in server hosting, on AWS reserved instances.
Well it all depends of your application and which level of automation you're looking for. If your growth require you to often scale your infrastructure, and you want to save your developers to deal with operations, then you may need someone.
Our spending is peanuts compared to what others spend on AWS. A few hundred thousand dollars a year is pittance -- there is actual pain from purchasing bare metal, maintaining / colocating, etc. AWS offers just the right amount of flexibility and pricing with reserved instances, and they keep driving prices down. The only unfortunate thing is their largest instances aren't beefy enough for some use cases.
Keep in mind that heroku runs in US-East - if something isn't working for you on their platform, you can just run it yourself and point your app at it. We run our own database on EC2 and it works fine.
> When our customers email us during our vacation pissed off that we are not around and we have nothing to show to them to prove that this is Heroku's fault and not ours
Here's a tip. You are a black box. Your customers don't care if it's your fault, Heroku's fault, the power grids fault or a meteorite's fault. They just care that their service is up. If anyone from you all the way down to the silicon breaks, its your problem. Not Intel, not RealTek, not Western Digital, not ComCast.
And that's what he's saying; Heroku's problems are his problems, but Heroku doesn't provide him a lever to solve their problems. So he's moving to somewhere else, where he has more visibility over the problems and can better communicate with his customers what the timetable is for repair.
The whole rationale of outsourced hosting is to have someone else take care of things, so it's a little comical to see the complaints when the pendulum swings the other way. What do people expect?
Heroku is a great way to get started fast without having to worry about anything related to operations. That being said, you're effectively outsourcing your entire ops/DevOps to a third party. You don't set the rules, you don't get to argue about the best scaling strategy, and overall your infrastructure is pretty much a black-box.
I've seen a fair amount of people who loved Heroku for ramping up apps only to feel trapped once they started scaling their service. PaaS definitely has its use case, but I wouldn't recommend it for anything beyond prototyping.
In fact (shameless plug), this is why I've been working on devo.ps, to lower the barriers of entries to managing your own servers (on AWS, Rackspace, Digital Ocean, Linode...) using tools familiar to developers (Git + YAML).
@bcardarella happy to add you to our beta testers; we'd love to get feedback from a heavy Heroku user. Email is in my profile.
If there are other Heroku users affected, drop me a line and I'll see if we can help you get set up on your own infrastructure with devo.ps.
They already are the same price/performance [roughly, unless you are buying DO's $5 nodes]. They also have stuff like load balancers built out and are reasonably reliable [although only at the datacenter level, they don't have multi-DC load balancers]:
Just a counterpoint as someone who switched from Linode to DO (and indeed have used mostly their $5 nodes) and currently uses a mix of DO and Azure: You need to be comfortable with command-line server administration to use Linode. Yes, they've got tutorials, and they're pretty good. You can be walked through installing __________ on whatever specific flavor of OS you happen to be running. But the moment something goes wrong or I had a slightly-above-novice-level question, the response was that Linode did not offer that level of support and it was a self-managed server (or whatever the specific language was).
Not complaining, that was exactly what I was paying for, but it's certainly something to take into consideration if you're not comfortable with command-line server admin.
DO is pretty much the same as Linode in that respect. You have tutorials and command line comfort as the basic requirement for using the product.
Both the parent and the OP both suggested switching to DO which is the point of comparison.
I've never had better support/more comprehensive with DO than Linode. Then again, I've had a Linode account for like 6 years and I think I've opened as many tickets. ;)
Be warned that Linode's status page is also borderline useless. Their website and web-based management interface (and API, according to customers on their IRC channel) were throwing 500 Internal Server Errors for >2 hours the other night, and their status page never had any mention of it.
There's a few options out there. You can treat DO as a compute-only EC2 if you'd like. I work on a project called Rubber [1] that can provision servers and templates for you. DO is one of the supported providers. We've taken care of a lot of the annoying stuff for you, like mapping security group rules back to iptables, setting up DB replication, off-site backups to S3, and so on.
You'd likely be able to do the same with chef, ansible, puppet or whatever else. I don't have deep experience with those options, so I can't really speak to them.
I haven't used Cloud66, but I've seen them come up a few times. I think the biggest difference is Rubber has templates for common pieces of software and deployment scenarios. Since it's based on the role-centric system Capistrano provides, you can do things like set up several "app" machines and then a "web" machine, which runs HAProxy and automatically routes requests over those "app" instances.
But, I think we fill that niche between hand-holding and trusting devops/sysadmins to do what they think is best. We try to handle the monotonous tasks for you (configuring DNS, setting up backups, configuring load balancers, etc.) out of the box, but give you a great deal of flexibility to influence that. Almost everything is bash and ruby with a healthy set of utility functions.
I admit, it occupies kind of a weird middleground. If you use Heroku because you're not well-versed in sysadmin tasks, there will be something of an uphill battle (but a great learning opportunity). If you want to outsource everything, it's not a great fit. But, if you want a lot of flexibility, have unique app contstraints, want the abilitity to switch betweeen cloud providers and not deal with everything from scratch, it's a pretty good fit. I've personally used it with EC2, DO, vSphere, Vagrant, and leased hardware. It's nice being able to treat them basically uniformly.
Checkout visualops.io. We are language-independent devops automation service.
Similar value with Heroku: autoscaling, auto-healing, push-to-deploy, but not blackbox. Instances runs in your own account. Even if we go down, your app will not be impacted.
Nope. It's got a Ruby inclination, but it's something we're divorcing ourselves from. Deploying Sinatra and other non-Rails, Ruby apps is pretty straightforward. Deploying non-Ruby apps is certainly doable but currently requires a bit of creativity (I've deployed Java and Scala apps with it).
It also spins up other server types like Selenium. (I do alot with Selenium, and even have an accepted pull request in Rubber's Selenium implementation)
I've been trying out Dokku (https://github.com/progrium/dokku) as a Heroku alternative. DigitalOcean has a nice VM template for Dokku; I think Linode has something like it too (or instructions for setting up Dokku).
So far, I like it; Dokku is pretty slick. Deployments are more or less the same as Heroku - "git push dokku master", in the case of my Rails app. But that said, assuming Heroku's security is super tight, I have a bit of work to do to secure my droplet.
I'll probably still have to scale myself (or just switch to Linode and use their load balancer), and with Dokku I've got a PostgreSQL DB in my Dokku container.
Droplets disappearing or crashing... not sure, but I sure hope DigitalOcean's backups are good :)
Currently - nothing that needs scaling for my personal projects. However, as a consultant, it's very possible that a large client may need to scale (dealing with this right now on Azure), so I like to be prepared with options. That said, I know better than to put Dokku in production for a client - at least not until it significantly matures :)
Not the OP, but if you're looking to drop Heroku (because of cost, or reliability?), it's not that hard to host your own on a VPS or dedicated servers.
Setting it up doesn't look too hard, and you're not going to need it until you reach significant scale - so probably 90% of the people reading this without a dedicated ops team aren't ever going to need it. Startups often seem to over-engineer servers at an early stage or assume they'll need a massive cluster when a single server with multiple processes could serve perfectly well till you actually hit some sort of scaling problem (i.e. millions of users a month). As an example, the HN website ran for a long time on one server (not sure what they use now, but it's not heroku :). Very large VPS instances often cost less than scaling with heroku.
Also, other providers like Linode offer load balancers without setup for a low cost.
What do you do when one of your droplets disappears or crashes?
ansible myserverplaybook.yml
to set up another one in a few minutes. But you'll find this just isn't necessary unless the server config changes - uptime will be measured in months on most providers (not sure about DO in particular, haven't hosted anything serious there).
Where do you host your database?
On another DO droplet, with remote backups, or even on the same one if you have a sizeable VPS and low to moderate traffic.
I haven't deployed with them yet, but have been seriously considering making the switch to Cloud66 (http://www.cloud66.com/).
It looks like the best of both worlds -- higher level than Dokku, but choices when it comes to the underlying infrastructure provider (Linode, DigitalOcean, AWS, or some combination). Small overhead charge for the management which seems very reasonable, otherwise you're just paying for the (virtual) metal from the IaaS guys.
We've been using Cloud66 for a couple of months now. You may find yourself tweaking the manifest files, though. It's not immutable architecture, but rather, they deploy over and over using Capistrano. So over time you'll build up some artifacts/cruft, especially if you're doing anything that generates a lot of log files. Not a problem, but you'll want to configure it to use instances with enough storage, which in a default config is often only 8GB.
I've been really happy with cloud66. Discovering new features everyday that saves me from having to build nginx/postgres/passenger from scratch. Try it out, I think you'll find it provides the balance you seek.
Deployment and maintenance should not happen in the middle of the night. We've known this for years.
You design your systems to degrade gracefully, hopefully transparently. Screwing up your engineers' sleep schedules is a terrible idea, so you deploy during the day when people are awake, clear-headed, and on the clock.
I'll take a 1PM outage over a 1AM outage any day. If that's bad for business, business should pay for the engineering time to build a more resilient system.
This is not how businesses are run. No sane person running a business will take a 1PM outage over 1AM. The whole point of heroku is that you are relying on them to do a lot of the infrastructure systems because you are low on manpower. The resilient system will be built over time and is being worked upon. But to put the blame on heroku's customer for the outage is nonsensical.
People do night shifts in so many industries. There's nothing wrong with that, it's just part of the job. My building security guard does a night shift and what he does is way more important and needs more attention than an engineer. Unless you think we should not have security guards in the night.
Very few professionals do night shifts. Equating a security guard with IT work is insane. The intellectual demands are nowhere near equivalent.
For professional examples of "night shifts", we have ER doctors. They do work 12-24 hour shifts -- but they're allotted a room they can sleep in between patients, and they often have 2-7 days between their shifts.
You going to pay infrastructure staff what you pay ER doctors for as much on-task work? Are those people going to actually know what to do when you put them on-task?
What happens when the IT staff needs to get a developer on-hand to resolve an issue? Wake them up?
(for cases like Heroku) When infrastructure staff fuck up, people don't die. There's a massive difference between "make sure a few scripts I wrote earlier work right" and "patch up the guy who can see his guts cause some drunk driver ran a red light at 3 AM"
>What happens when the IT staff needs to get a developer on-hand to resolve an issue? Wake them up?
Yes. Have you never worked at a place with an on-call dev team?
Plenty of sane people prefer a 1PM outage over a 1AM. Everybody is in the office. Everybody is wide awake.
Of course people do night shifts, but at the same time, we've learned that you're better off fixing things during the day shift. The night shift is an emergency crew.
And the idea that a building security guard needs to pay more attention than an engineer is beyond ludicrous.
I think an important point to note is continuous deployment, twice a day seems logical and minimal to me, even if the deployment is deploying code from yesterday (a 1-day gap).
That's like, three logical fallacies tied together.
You've made an argument to authority with your lead-off sentence of "that's not how businessses are run" and a no true scotsman argument with "no sane person running a business", and then tied it up with an absurdo ad reductum analogy about security guards.
There are plenty of shops that I've personally worked with, and many more that I've read about here on HN that do daytime maintenance / make potentially breaking changes to production. I'd be willing to bet Heroku does a mix of day and night time scheduled maintenances, with scheduling determined by a reasoned analysis of the risks and potential for an outage.
If you think a 1 AM maintenance window helps anything, I'd bet you've never seen the sunlight come up as you staggered out of a datacenter 10 hours after your "midnight maintenance" went south, and no one with an optical light meter to help you trace down a dodgy fiber run was awakee at your transit provider, so you had to wait until the first shops opened to buy your own...
...ok, maybe I'm a little bitter. What I'm getting at is, a 1 PM outage - although more customer facing than a 1 AM outage - will almost definitely be resolved faster, because resources outside of the engineers performing the maintenance are in abundance at 1 PM and not 1 AM.
Bonus points: your engineers will not be zombies for the next three days, and that's important both for post-outage work and to make sure everyone stays sharp for the potential "bounce outage" (e.g. your first outage was the core router dying, the second will hit the same week when your new replacement core router's slightly upgraded IOS version causes a BGP flap under certain conditions that don't arise until a good 48 hours after you mail out your customer outage report....)
That might sound trivial, but at scale - if you have tons of engineers and are doing tons of maintenance all the time - it's actually really important, at least IMHO.
(Quick, take a guess - how many Heroku daytime maintenances do you think went off without a hitch that you've never heard of?)
P.S. what time zone are you in? I assume your time zone's 1 AM is the important one, so someone else on the other side of the country is going to either have a late evening or early morning maintenance window..... ;)
I have never lived in a place with security guards in the night. I think a big part of freedom and privacy is dead when you have to have security guards. Of topic, sorry.
No one with half a backbone should be willing to do scheduled work all day and then during any portion of the night.
You need to build a resilient system, and that's gonna cost you. It's a lot less fun than adding features, which is why business people tend to put it off and instead manipulate the tech people into working crazy hours.
The 24/7 web culture is dying. Adjust your business plan accordingly.
That sucks. Whilst we're able to do most work during the 9-5, there's the odd time I'll come in at 5 or the IT manager works a Saturday because we have to take down key infrastructure to work on it.
If your working environment is good, it's pretty easy to just say "I'll take tomorrow off and come in to do it on Saturday" without feeling like you're getting taken advantage of.
As a seasoned engineer, I could not agree more with the justifications you provide.
As a business owner who's customers are actively engaged with my staff who need servers and resources to be available during the day when, you know, my CUSTOMERS are awake, your recommendation for outage windows would be completely ignored.
You schedule outages around the people who pay the bills, not the people who don't.
The plan is to never have an outage. Instead be scheduling maintenance that should, at worst, result in some backed up queues or non-functional admin functions.
Building a system for graceful degradation costs time and money.
Number of banks or hospitals alrs has worked at: 0
Hope for the best, and plan for the worst: Build resilient systems when possible, but why risk (or guarantee) outages during the day? OS upgrades, router swapouts, and so on can NOT take place during the day. That would be foolish in most industries.
Also realize that "just make systems resilient" is not an easy thing when multi-million dollar transactions are occurring on 30 year old code. If everything were greenfield, it'd be different. But it's not.
Thing is, a lot of places are doing that. The downside is the geo-distribution gives us Indian contractors with strange accents, inability to communicate and assert, and overall a huge communications issue by default.
Australia needs to step up its subsidies to IT outsourcing firms.
For those saying that the parent comment is crazy, or not applicable to "real businesses" - nearly every production push at Google is done during the normal workday hours of the respective development team (which means most of them happen between 9am-5pm PST).
The benefit of having the engineering team around to troubleshoot smaller-scale issues before they turn into large-scale outages vastly outweighs the small benefit of potentially moving the extremely rare massive outage to a less busy time of day.
Yes, Amazon have a rule to deploy during working hours, during core hours when engineers are around and available should things go wrong. Also, Amazon is one of a number of organisations who don't deploy releases on Friday.
Amazon does not deploy changes during times that they feel can be customer impacting. For North America this means that deployments will be during the night.
IF you can afford the service interruption, which is true for lots of companies and their internal IT systems, you always follow this policy. All concerned including the users are awake and on their normal sleep and work cycle, if you mess something up you find out immediately from those users, etc. etc.
8AM EDT is 5AM PDT, which is where web tech can be found in the United States.
I'm not sure if you're just being sarcastic, but talent in North America isn't limited to the Pacific Timezone, and such a claim borders on hilarious delusion. I don't say hilarious emotionally or pejoratively, but rather it is actually ha ha funny that anyone would actually believe that.
However yes, there are endless loads of top-skill talent that will happily do "maintenance" in the middle of the night. I've done it on occasion, and slept in the next day. Big deal. Half the time I simply took a timeout from some online gaming I'd been doing.
To do something off hours implies a self-esteem problem? That is absurd. People do stuff off hours in return for something, which might be compensatory "time off", additional pay or bonuses, etc.
It's all about reducing the risk. Accidents happen no matter how well you've engineered your maintenance and deployment practices.
Do everything you can to reduce the risk, including building a resilient system, AND not taking a major swath of it offline during a high traffic period.
If you are a company who is using Heroku, you likely do not have the resources (knowledge or money) to set up a system which degrades gracefully when your entire hosting platform goes down. That's a hard, and expensive, contingency to plan for.
And even if they did build their service in a way which could tolerate a complete hosting platform failure - degrading a core service during your customer's business hours is enough to make heads roll. It's just stupid.
No, that depends on what your traffic curve looks like. For me, I have to do my maint at 12AM PST, because that is when we are in a trough with respect to traffic.
12:30am on EST is 3:30pm AEST (Australian Eastern Standard Time) and 7:30am in London.
That way you have skilled people awake all hours should a major issue develop. These people can fix the problems while US based clients are sleeping and non-US clients aren't stuck with waisting a day with less severe faults because the engineers only work 9 to 5 PST.
Heroku does have it's problems but if I'm being honest I don't have the time to spend using anything else right now. Perhaps when I hit a larger scale/stage I'll reconsider that, but at this stage there really isn't anything better.
We're working on switching from Heroku to EC2. To my knowledge there isn't a lot of documentation out there on this process - we're just looking for code snippets to figure out how to build AMIs, deploy, etc. It's a slow and error-prone process. Anyone know of good resources to make this migration?
Take a look at Elastic Beanstalk too. We use eb with docker containers and it has been pretty decent. Handles auto scaling, load balancing, fault recovery, deployment itself, and Docker provides great flexibility.
I have never seen any documentation on moving from Heroku > EC2. They are very different services and there are many options to replicate the heroku api.
Not used it yet but www.cloud66.com always interested me.
We are currently working on a product based on AWS that could fit your needs, giving you as flexibility as you can get with a manual EC2 configuration, but with the simplicity of a service like Heroku and the reliability of a service like OpsWorks: visualops.io
Don't hesitate to contact with us if you have any remark or question. Also, we recently released a "store" comporting some template of common application to help people coming from PaaS platforms: store.visualops.io
Our last engineering job req we received about 10 sysadmin applicants almost all of which started with a "I know I'm not an exact fit but..." sort of line. Maybe it was just an oddity.
I think there's a reason, but it's not outright unemployment - yet.
On the enterprise side what I'm seeing are traditional sysadmin positions being eaten up by aggressive VARs and software vendors. They all have XaaS plays now and they're targeting their existing customers first.
The pitch is literally...
"Hey, I can save you $350K/yr. Just fire your ________ team and use our PaaS. We'll perform the migration for free."
Assuming the organization doesn't want to pursue their own internal or lower-level external service it leaves the sysadmins to either somehow make the leap to administering a provider system or diversify into other roles where they don't exactly fit.
I've found them to be fairly terrible. They work great until you need to do more than use pip to install stuff (I'm using python). So things as simple as matplotlib get hairy. You have to use a custom buildpack someone contributed (which is based on an old version).
Now say I want to use python with R. It is a huge pain. You have to use heroku-buildpack-multi [1].
The problem is that environment variables and installed code from the first buildpack are invisible to the second. So you end up having to hack individual buildpacks, which is just gross. It worked in the end, but I wasted days on it.
A concept like docker, where you fire up a container, install stuff with apt-get, and then save the results is greatly superior.
I found a combination of buildpacks + docker really useful.
We use the the slugbuilder from the flynn project[1] which gives us a finished version of the app in a .tgz which will than be extracted into a docker container that already has all the other necessary stuff setup (logging architecture etc.) which is needed in different parts of the application (webserver, background workers...)
Customization. Ever try to add a command-line tool to a buildpack? I don't understand why we cannot make use of apt as the EC2 instances that Heroku uses are just Ubuntu AMIs.
I am not a Heroku customer and maybe there is more substance to his complaints but I didn't see anything in the post that would cause him to want to leave his hosting provider with guns blazing like this. Reduce their prices because their service providers have reduced their rates? Seriously which company you know does that? Do we even know what arrangement Heroku has with amazon? Isn't it possible that as a large customer their rates are already discounted. On the subject of downtime he didn't actually say that his application was down as a result of the maintenance in this instance or any other.
What are the alternatives to Heroku? I'm looking for something that makes it easy to deploy and launch services. http://deis.io looks promising but it isn't production ready yet.
I literally just discovered Dokku (https://github.com/progrium/dokku) yesterday - a "Docker powered mini-Heroku in around 100 lines of Bash". You can roll your own PaaS infrastructure on your own VMs.
It took a few hours, but I was able to get my Heroku-based Rails app up and running on a Dokku droplet (to be fair, the most time-consuming part was getting the SSH keys right and figuring out that I needed a domain name - e.g., couldn't just set it up with a droplet's IP address - at least not with the DigitalOcean tutorial's instructions: https://www.digitalocean.com/community/tutorials/how-to-use-...).
Serious, non-sarcastic question: So now you have your own mini Heroku on your own VM somewhere. Is the probability of downtime now smaller than it was on heroku? It seems to be it would be higher...
Yes, but now you can at least plan/schedule downtime according to your particular needs, and unscheduled downtime is at least a little more transparent to you. Sounds like that's what the original article wanted, in any case.
That's a legitimate benefit, although I was suggesting it's offset by the greater chance of something going wrong when you are managing your own servers, especially if you're not an expert sysadmin. I would assume Heroku instances are secured better than a box you're locking down yourself.
If you have a Rails App, you should consider Ninefold.com. You deploy direct from Github (or your Git Repo) and can have a VPS in the same security group if you need a utility box.
Depends on your use case. Advantage is around performance, and that you don't have to focus on the devops part of deployments that Capistrano requires. You get full access to servers & free SSL for rails apps. Strong engineering support.
Deis has quite a few successful deployments in the wild. Most of the work being done today is around automated testing, platform hardening, HA/failure modes, etc. A stable release is not far away.
Deis is for software teams who need to operate their own servers (for a variety of reasons). If you're a "regular developer with little real world Ops experience", I personally recommend sticking with Heroku or equivalent hosted services despite the occasional outages.
seriously, don't host your own PaaS. It simply add more complexity into the game. When things went wrong, you have no clue how to bring the PaaS back at the first place.
Learning how to secure your servers and scale your app is not an overhead, it is a competence compared with those who don't.
I work at an Brazilian mobile ad network running on Amazon AWS and our traffic is really intense. There are 13 developers in total in the team and none of them are DevOPS, which make me conclude that most often than not, you don't need a person specialized into server maintenance and infrastructure.
Like some people said before, Heroku is great for prototyping but just not a good deal in the long run. Too expensive, way to risky/unreliable for serious stuff
Heroku's defining characteristic has always been that it radically reduces the sysadmin surface area of my app stack to something I can reasonably handle. So, while things like Dokku, etc. are really interesting they don't seem to be actually expanding the number of app-stack layers that I'd need to know to properly secure and maintain a server. Am I missing something?
I'm still in the startup phase, and the time saving of Heroku has been fantastic. I'm ever mindful of these types of stories and the scaling cost though.
I've not seen it mentioned so far: RedHat has OpenShift (https://www.openshift.com/) out now, v3 will see Docker support too. With only a quick play, it seems to be a mix of Heroku and dedicated host.
On sysadmins: talking to a guy who managed AWS nodes for a popular iOS game you've probably played, he made the comment that it was great to have a new team member onboard now, dedicated to looking after all the VMs. I figure he's rediscovered sysadmins, just using different kit.
The risk I see with managing your own Dokku, etc is handling issues/maintaining uptime if it's not your only job/skill. When another HeartBleed comes along, can you react quickly and competently enough?
I'm a devops engineer and I never really saw the draw of Heroku or EY after you scale out to a certain size. EC2 is extremely easy to work with and there's tons of documentation out there. Once you setup your CI & CD, write your cookbooks and capistrano (or whatever), it's nearly hands off.
Heroku is the dreamweaver of devops... but you know developers often don't have the time to spend 2 months+ learning how to do it right (hence your job title :-D). And for my small app I'll wager that Heroku has infinitely better uptime than I would manage!
I'm in your boat, but you have to realize that heroku is not targeted for anyone who is comfortable doing top to bottom provisioning of their environment. I also think their value proposition is a "grow with us" type situation: get you in the door for cheap and then as you grow you just keep bumping resources up.
In short, Heroku thinks that firewalls encourage people to be lazy about security, and leaves people crippled if the firewall is exposed. Due to this philosophical difference, they would rather force everyone to rely on SSL for all secure communication, and leave every machine accessible to the internet, including (for example) your database server.
Easy. Just use deis [1]. Or CoreOS [2], if you're more hardcore. In any case, with the advent of Docker and 12-factor -- not to mention openstack [3] -- Heroku is a commodity or fast approaching it, and will soon face pricing pressure.
The idea that maintenance should not be done during the day is a bit antiquarian. Scheduled maintenance should take place during their company's workday when they have as many people as possible available, feeling awake, well-rested and not distracted by the things they might prefer to do with their evenings and weekends off work.
Of course, scheduled downtime should be done in off-hours, but modern services like Heroku don't have scheduled downtime.
I agree with this argument, but there's plenty of work hours in the day, and 2 PM Eastern is just about the worst one to pick. We try to start scheduled maintenance around 8 AM - that way we have fully refreshed devs ready to address any problems, and if something does go wrong, it's a relatively low-traffic time for our application.
The reality is that there is no good time for a large-scale platform to go down. I'd speculate that 8am PDT would be worse for Heroku, as their business seems largely split between the US and Europe, so downtime then would upset everyone... but that's just speculation.
You can't build a company of developers selling products to developers while fleeing the working hours of developers.
If Heroku simply re-sold AWS, people wouldn't use them. The argument that AWS's falling prices should be passed on to customer's is logical only if you're ignoring that Heroku is using the savings to build and increase the performance of their platform.
It's really the complete opposite case, I think. For the value (time is money!) they offer, it's amazing they haven't increased their prices as they scale.
Bit surprised no one mentioning Cloudfoundry in the comments.
It's pretty much opensource Heroku solution, run it againts AWS and it's almost exactly heroku, and with all the tooling around it quite easy to setup.
Not sure, but is something like Divshot a viable replacement for Heroku - they both seem to be doing the same thing: developer webapp hosting, but there's probably a difference somewhere.
I do not believe Brian has a strong understanding of what it actually takes to do ops work.
1) Downtime.
This is such a huge fallacy, if you think doing this yourself means no downtimes think again. If you had a top notch ops team you are still going to have outages in the middle of the business day. This team of ops guys you hire, guess what? They are not going to be deploying at 1am every day, that is not sustainable. They are going to deploy in the middle of the day, and guess what, they will make a mistake. And guess who notices outages first most times? Customers.
Heroku's uptime is typically in the 99.9%'s, I don't think you can do the same with one ops guy, maybe you will get lucky, however it is more likely that you won't.
5) Buildpacks
Build packs are an AMAZING abstraction. Just think about how you would solve the following without buildpacks:
5.1) Install libgeos on all the hosts running a specific application?
5.2) Create and scale new worker instances on demand?
5.3) How do you setup logging for all these servers?
5.4) Health checks for these services (tcp, http)
5.5) Alerting and automatic failover
* pssh? That will work fine until you spin up a second server and forget all the incatations to get your software up and running.
* Puppet? I am sorry but Puppet, chef and their ilk are horrible, who knows what state any machine is ever in at any point in time. humans will inevitably forget to ensure absent, or make an assumption about state in puppet land. Nevermind the slow feedback loop in puppet land. Oh hey you want to change all the servers running X to now run Y instead, good luck with that.
* Mesos? Mesos is the best open source scheduler out there, Marathon makes it easy to deploy to and there are scripts to run easily on ec2. However if you want to install native dependencies (libgeos etc) you will still need to do one of the above things.
* Docker? Docker is an amazing way to package your app up and ship it to run on Mesos, however how do you package docker apps and ship them? Guess what? Buildpacks.
Oh hey you can also write your own language, write a buildpack for it, and deploy it to production without ever having to get an ops person anywhere to setup a server for you. I am sorry but that is nothing short of amazing.
2 and 4)
I agree 100%, it sucks to feel powerless and have no insight as to when things will be resolved, however if you work in any decent size org ops things will go down, and you will also have to wait to see any resolution. I agree it feels better to see people working busily in your office because you can see the sense of urgency. I
3) Pricing:
In regards to pricing: yes Heroku is very expensive. However for small and medium teams it is much cheaper than hiring an ops person. You also get an economy of scale when you use Heroku; Think of all the things heroku takes care of for you: automatic failover on hardware/software(segfaults)/vm issues. Disks filling up on some host (it is embarrassing how many times I have seen something as simple as this cause issues). Agility; You need to switch MRI out for JRuby or rubinius, no problem, just update your ruby "version" in your Gemfile. The Heroku platform is superior in almost every way for running small to medium size applications then any in house cooked alternative.
I am very familiar with the alternatives out there, I have contributed to dokku's buildpack repo https://github.com/progrium/buildstep, and have patched docker.
There is no good behind the firewall/on my vm/hardware alternative for heroku atm. People are making progress but you won't really get anything as full featured as heroku.
> Puppet? I am sorry but Puppet, chef and their ilk are horrible, who knows what state any machine is ever in at any point in time.
I always know what's on my servers all the time; I build immutable servers and throw away the old. And I'm not particularly knowledgeable or skilled at this; I just use a little good sense. I mean, your problems with devops may actually exist, but this is silly.
Sure, but the feedback loop is still slow, also adding support for a new app is much more time consuming, I would much rather have an api call and a git push and get back to delivering value.
You may rather have that, but then it breaks or it doesn't scale and you're up the creek. Which is why people pay people like me.
At Localytics, adding a new Play 2.x (our current standard) app into our new deployment infrastructure takes one command and about 120 seconds. The only part for which a developer needs devops input is the first production deploy, which require a signoff and some (automated, but privileged) IAM permissioning. Adding a new app on a framework we haven't incorporated yet (Rails is probably next) would take a few days, but needs only be done once for all developers at the company. And we know it works with our entire ecosystem.
You might have had bad experiences in the past, but you're extrapolating from them to the state of the world. It's not what you think.
In fairness to Brian, I think you are flipping to the opposite extreme which isn't really valid either.
Especially here:
> Puppet? I am sorry but Puppet, chef and their ilk are horrible, who knows what state any machine is ever in at any point in time. humans will inevitably forget to ensure absent, or make an assumption about state in puppet land.
You setup the new cluster, switch the load balancer, tear down the old cluster.
Why can't a company the size of heroku do maintenance at 1am?
I've worked for 2 previous startups doing this. US -> Australia -> UK makes a pretty good global coverage strategy. Once you can afford 3 people this can become a very effective way of ensuring 24 hour coverage.
Between all the countries in australasia and Europe 24 hour coverage is not that difficult. Not to mention rates for a lot of those countries are cheaper then FTE's in the US and if you bring them on as contractors, you still get to avoid having a business presence in those countries (for Tax and what have you).
> Once you can afford 3 people this can become a very effective way of ensuring 24 hour coverage.
How can you run 24/7 with 3 people ? In Australia that would mean you would have to be above the allowed 38 hour working week as it would be a 56 hour working week, overtime and possibly penalty rates would apply. Even if it is on call a lot of companies in Australia pay extra for that.
> Not to mention rates for a lot of those countries are cheaper then FTE's in the US
I do not know who you are employing in Australia but I would suggest that there is only about 5000 - 10000 USD p/a price difference for mid-level guys.
> How can you run 24/7 with 3 people ? In Australia that would mean you would have to be above the allowed 38 hour working week as it would be a 56 hour working week, overtime and possibly penalty rates would apply. Even if it is on call a lot of companies in Australia pay extra for that.
You wouldn't do 24x7 with 3 people, but you could do 24 hours x ~5.5 days with the rest on call. You could cover all Business hours within the week. Secondly you don't bring engineers on as employees (that requires all sorts of legal & Tax liability). You hire them as self employed contractors. This works out great for them (they can claim all sorts of things as tax write-offs for their "business") and you get to avoid dealing with the bureaucracy.
> do not know who you are employing in Australia but I would suggest that there is only about 5000 - 10000 USD p/a price difference for mid-level guys.
Avoid Sydney & Melbourne.
A lot of people will take the opportunity to do this kind of work for a decent wage ($65k in Brisbane, $55k in regional centres). I know people who do this work, move to Thailand and live in luxury on similar dollars. Brisbane is one of the Red Hat support centres so there are plenty of engineers moving on from lowish salaries offered.
Not to mention once US interest rates start going up the AUD is set to start dropping again.
> You wouldn't do 24x7 with 3 people, but you could do 24 hours x ~5.5 days with the rest on call.
Thanks, I hope I didn't sound too harsh as I just wanted to understand how you were doing this.
> You hire them as self employed contractors.
If you are an Australian company you may want to be careful with this. A contractor is not necessarily a contractor in Australia, there are some strange guidelines as to when a contractor may actually become an employee.
> A lot of people will take the opportunity to do this kind of work for a decent wage ($65k in Brisbane, $55k in regional centres).
I worked in Brisbane for 12 years and that sounds about right. Although the need for Linux admins seems to be increasing so there is some interesting salaries being offered.
> I know people who do this work, move to Thailand and live in luxury on similar dollars.
One can only dream, although I now live and travel around Asia .... I am just missing the $$$ :)
> I just wanted to understand how you were doing this.
Well actually I am not doing it anymore. It turns out that the when the GFC hit the startup I was working for went under. So did a lot of places and I had trouble finding work. Now I have more family commitments I am a working peon for a small digital marketing company.... but there are days I wish I was back doing that kind of work again.
> If you are an Australian company you may want to be careful with this. A contractor is not necessarily a contractor in Australia, there are some strange guidelines as to when a contractor may actually become an employee.
Yeah this really only works if the contractor is working for a non-australian company. Then they are liable for their own super / tax / etc. Its a bit of a hassle, but a good tax accountant will get it sorted and reduce your liability at the same time. Contractor gets to write off their home (where they do the work), their tech, and more. Government hasn't really cracked down on it, because its an export business and your helping the economy.
>I worked in Brisbane for 12 years and that sounds about right. Although the need for Linux admins seems to be increasing so there is some interesting salaries being offered.
Unfortunately not as much as I would like... seems between a lot of "cloud first" outsourcing and the fact that linux / unix roles tend to be in HQ (i.e. in Sydney or Melbourne for aussie companies) the linux roles tend to be limited.
On the other hand PHP/Web/.Net developers with linux knowledge is in demand. (Quite strange the .Net developer with linux server experience, but just check seek.com.au and you will find them).
Brisbane is and it appears always will be a MS town. 80% of systems roles are Windows/Cisco/VMware focused. Smaller guys tend to be more Linux / Mac OS based, but then you end up with hybrid roles where your also 15 other roles.
I don't think you are in any position to decide what I do or do not have a strong understanding of. Let's break down your thoughts:
1.) Downtime - You seemed to gloss over my primary point. The downtime today could have been completely avoided if Heroku did not opt to do a 2pm EST scheduled maintenance. This was a very large risk they took and it blew up in everyone's face.
As far as resolving downtime. I stand by my original statement that a PaaS like Heroku cannot recover as fast on average as another solution. They have too many customers that they have to get up and running.
5) Buildpacks - I've been fielding this one on Twitter all day and am considering writing another blog post. I didn't really give much weight to this one and that opens it up to attack pretty easily. Here is my point:
Buildpacks in general do not suck. Heroku's implementation of Buildpacks sucks. Customizing Heroku buildpacks is a nightmare. Maintaining said fork is nearly impossible. There is no versioning of the buildpacks. There is no way to know if/when slugs are updated and my fork is no longer useable or needs to merge in upstream changes. Getting anything into the official buildpacks is a political nightmare. Adding CLI tools not in the heavily stripped down Ubuntu AMI is a gigantic pain in the ass.
re: puppet - I'm not sure why you're hammering me for Puppet as I never mentioned it. You seemed to assume that this is what we will use.
3) Pricing - I dismiss most of your argument unless we're talking about Heroku's database hosting. Their DB hosting is awesome and top-notch. Their app hosting has become a less than stellar. I don't even know what I'm paying for anymore. To me managed devops means I don't ever have to deal with pager duty or I can just hand customers over to someone after we launch their product. That's not Heroku, not by a long shot. At best we're getting security fixes and system updates applied. At worst we are paying a premium on resold EC2 instances for a deploy shell script that has been freely available for a while now.
I don't know about you but I don't really spend that much time swapping the underlying language. At least not without some heavy consideration, what we go with is usually thought out from the start of the project. To me, I'm not interested in the least in being able to swap out MRI for JRuby with a change of a line. In nearly 10 years of professional Rails development I have maybe had to do this twice. Upgrading Ruby is a different story, but that is also not something that should be done lightly.
Today's post was a culmination of years of frustration with the platform. I've had it, we're done with it. Moving on. Good riddance.
1. I don't think your primary point was that valid in that people are going to have to deploy during the middle of the day sometimes, and mistakes happen sometimes. We all have deployed an app with a bug in it at one point or another.
For heroku it is tougher, they are the app that runs other apps, so bugs have a much larger impact. I would hope they have post mortem's and take actions to ensure this doesn't happen again. IE why wasn't this caught in a staging environment?
I agree that heroku totally fucked up, I would feel terrible if I was the engineer that fucked this up (as they should). However I think you are throwing the baby out with the bathwater.
I have seen outages last 1 hour + at big startups in the valley, it happens. I am not saying that it is right, I am saying that it is very difficult to do right, and very few companies are able to do it. It generally takes quite a few people to do it and far too often I have seen it come at the cost of being able to move quickly. Not excusing heroku for making these mistakes, I think it blows pretty hard. I just think there is no good alternative.
I believe that you are seriously underestimating the amount of effort required to deliver 99.9% reliability. I also believe you are underestimating how much heroku does for you on a git push if you think it is a bash script that has already been written. If that were true we would all have Heroku behind the firewall.
5. Heroku's implementation of buildpacks on their stack sucks? They have the only real implemenation of buildpacks? I agree forking/maintaining the fork sucks, I think there should be an easier way to install apt-get dependencies (ENV var).
Alternatively you can use: https://github.com/ddollar/heroku-buildpack-multi which supports versioning and chaining. You just make a buildpack for your custom bits, and install it before your ruby buildpack
I agree I don't often switch between ruby interpreters, however I do frequently upgrade MRI ruby versions, and the fact that it as easy as changing one line has saved me a lot of time. IMO this is much easier than any alternative I am aware of.
I have written a few buildpacks and I have found buildpack multi to be pretty composable.
I agree Dockerfiles are a decent alternative, however the part missing is a scheduling/discovery layer. Mesos docker integration is coming along via the folks at mesosphere but you still need to wire up the bits and pieces yourself.
3. I agree it is expensive, I think it is cheaper than a full time ops person to a pretty decent scale, and I think you need a lot more than 1 ops person to get close in terms of SLA, and featureset.
Honestly I think you just traded one set of problems for another. Who is going to get paged when both the vm's hosting a service die? Who is going to get paged when you run out of disk space? Co-location of apps? Hosting multiple ruby versions on all your hosts?
I don't fully appreciate your model but IMO it sounds like you have a client expectations problem.You should set the deliverable as a deployable app with the target as heroku, if they want to pay more for a VM they can pay more but they should understand the tradeoffs. If they want their own custom maintained servers or pay for you to maintain them, but it ain't free. It will cost you time and hopefully their money.
I don't swap language often if ever, however I do like to use more than one language depending on the problem I am solving.
Heroku support options have always worked to shelter their employees from customers and as has been pointed out, despite the steep decline in costs, their prices have not changed. It's a shame because they were a trendsetter, but they are quickly being surpassed by players like DO, Linode and Poppup.
The company that $parent is the founder of. Which, if you google for, you discover that their page is titled "Blog Logo" and contains no content. Check back in a week or so, I guess.
A frivolous addition with no facts to check. Admitted. Give it twelve weeks. The point is that Heroku can do better. No offense meant to sales force or it's employees, just honest feedback based in real experiences.
Even if there is a capex drop, Heroku's focus is simplicity. 5 or 10 cents an hour is part of the value prop; it may be hard to translate a 55.4 to 49.8 cent/hour drop for an AWS instance into their pricing model and not lose that simplicity.
That being said, it seems like the model itself is not flawed, just the execution. I've met many developers, especially in the Node community, who fundamentally believe they as software engineers should not have to spend any time learning devops.
My position is that a good software engineer today should be familiar with the full stack from frontend (SPA, javascript, html, css) to backend (sql, basic db admin, db design) and devops. Obviously, you can't become an expert in everything, but the knowledge level required to become "proficient" in these areas seems pretty achievable if you're well-versed in software engineering fundamentals.
FWIW, it took me about 4 weeks to get a "startup-ready" fully open source 3-tier stack running in EC2 with full automation using Chef (which is a whole other topic) plus continuous deployment using Jenkins. Ultimately, we'll pay more for EC2 per month upfront, and I had to absorb the upfront time hit, but now that we're setup, it's extremely easy to tweak things, and we have complete transparency.