I once ran up a $15,000 bill on Azure completely by accident when trying to get one of their video processing services to work. Once I figured out the service wasn't going to do what I wanted at a price I could afford, I tried to detach it and shut it down and thought I had succeeded. I didn't.
The offending process costing me money didn't appear on the Azure console and I had no idea it was running, how to access it to stop it or even to know what was going on. When it turned up on my bill I nearly had a heart attack.
Thankfully they let me out of it once I pointed out I was getting billed for something I couldn't see. I appreciated that greatly but I've never gone back to Azure and the experience scarred me so much I don't think I ever will. This was about 4 years ago so more than likely they have sorted it out.
They keep reminding me I have $50 of test/dev credit on Azure through my Visual Studio subscription but it flat out frightens me to even try to use it.
AWS isn't perfect but at least you have half a chance of working out what is costing you money through the billing console.
> AWS isn't perfect but at least you have half a chance of working out what is costing you money through the billing console.
The first few times I used AWS for tutorials, something similar happened to me. I thought I shut everything down, but kept getting billed and wasn't able to find it without contacting them. It was just a few dollars, but I've been wary about any services where you can't cap the billing.
Cloud platforms generally don't let users cap the billing, because those overages are good income for them. I prefer using services like DigitalOcean or Linode where you can be sure that your new site crashes for 15 minutes instead of bankrupting you.
Oh me too. That horrible "mythical beasts" tutorial. I thought I removed it, and then each month for the next two months I'd get nailed for 50? dollars or so.
By definition, I don't understand AWS, so figuring out how to turn it off was nearly impossible. AWS "support" didn't exist. Stackoverflow AWS geeks were in high dudgeon that I'd ask the question, "how do i disable this?" and would kill my question. Finally some kind soul did give me the trick to finding the last service to disable.
BTW the tutorial is absolutely useless. Just a thousand different incantations to repeat. No real understanding communicated. Felt more like an ad for myriad services.
There are consultancies whose entire expertise and premise are advising on AWS billing to companies who already know AWS, so the idea that a newbie should be aware of whatever magical incantation to limit their spend is ludicrous.
That’s not entirely fair. While I’m not going to defend the complexity of billing in enterprise cloud services it’s also not really that hard to set an billing alert in CloudWatch and track spend in Cost Explorer. Sure it requires a little bit of AWS knowledge but you shouldn’t really be using enterprise services like AWS (over other more accessible services like Digital Oceans) if you’re not willing to spend the time learning it; and at an individual level it’s very manageable.
The reason those consultancy firms exist is because billing scales terribly. Once you’re a business using AWS you’d likely have a multitude of projects running across a multitude of departments which need to be billed to a multitude of different customers and internet cost centres. This all needs to be processed by an internal 3rd party financial system managed by non-technical people who wont even know what AWS stands for let alone what it does and how it works. In those situations the problem of billing becomes exponentially more difficult than a one person hobby project.
> it’s also not really that hard to set an billing alert in CloudWatch and track spend in Cost Explorer
Billing alerts aren't good enough.
Consider a scenario where you're consulting on someone else's small- or medium-sized project and your bug costs the client a huge amount of money in the middle of the night. Now who pays? Say goodbye to your paycheck or reputation, even though it should have been preventable.
Another scenario: you launch a startup, and a bug empties the bank account and kills the company. If the solution is to just not use things like AWS and GCP (including Firebase, which has no billing cap) when you're getting started, why are they advertised that way?
> Consider a scenario where you're consulting on someone else's small- or medium-sized project and your bug costs the client a huge amount of money in the middle of the night
You can also set alarms that warn you of projected usage.
> Now who pays? Say goodbye to your paycheck or reputation, even though it should have been preventable.
If it’s legitimate usage then I’m not really sure what you’re advocating; are you implying a service being suspended in the middle of the night because a hard spend limit has been hit is somehow better for your reputation?
Or maybe you’re suggesting that it’s not legitimate costs, in which case you’ve set AWS wrong to begin with and thus your reputation probably deserves to be queried.
> Another scenario: you launch a startup, and a bug empties the bank account and kills the company. If the solution is to just not use things like AWS and GCP (including Firebase, which has no billing cap) when you're getting started, why are they advertised that way?
No cloud service operates that way. In that situation you’ll almost always get charges refunded. Even in instances of gross negligence (which would be the case here since for the bank account to be emptied it means you’ve not been watching your spend for more than a month and no business should operate that way)
I do get the points you’re trying to make but I’ve been working with the cloud for some time and have seen plenty of horror stories, all of which were due to gross negligence and most of which were still refunded by AWS as a gesture of good will. They’d much rather have your repeat business than burn their users with bills that cannot be paid.
I think the issue with billing is the same issue you get with OOMs - which services should go down first if we’re out of money? In practice “OOM” (Out Of Money) is even worse than OOM because with Out Of Memory you are just above the available memory threshold, but with Out Of Money you are literally at 0 capacity which means every single service needs to be killed.
I’ve built solutions for customers who would prefer an unplanned outage over an unplanned $250k expenditure. Many customers think that I’m a dinosaur for saying it, but if there’s a 2-5 year expected lifetime for something, it’s almost always more cost efficient to use traditional colo or VPS.
Also, operationally it’s possible to have something more than all or nothing. Big companies usually have “Tier-0” services that must be up all costs. AWS is no stranger to complexity - this type of function doesn’t exist because it would cost them money. They probably make 9-figure money from obviously idle services.
If you want your spend to be capped then there’s nothing stopping you from setting a CloudWatch alarm at a budgeted threshold that then scales down your infra.
AWS is like Lego. You’re supposed to build on it to create the behaviour you want
Hum, I don’t know think you understand what I am saying:
If you are out of money, all of your services needs to be destroyed immediately: including all of your database hard disk drives, because every single piece of infrastructure yields some regular cost.
It means that it is NOT going to be “a simple unplanned outage”, it’s going to be more akin to formatting your hard drive with all of your family photos on it.
Pretty certain AWS just finds it easier to sometimes write off costs rather than implement something so radical that customers may be even less happy afterwards.
> > and at an individual level it’s very manageable
> And yet, we have horror stories of students and even experts being hit by surprise AWS bills.
They obviously didn’t bother to manage it. But that doesn’t mean it’s not manageable. It takes all of 5 minutes to set up a budget alarm on CloudWatch. It was one of the first things I looked into doing when I set up my own AWS account years ago specifically because I didn’t know what I was doing back then and thus didn’t want any surprises. If I managed it then, then I find it hard to believe others cannot too.
It's been 5 years since I was responsible for anything AWS, but back then at least support was surprisingly good even when spending less than $100 a month. Emails would get answered and I could often get someone on the phone within 24 hours if I needed.
I think it depends heavily on what area you want support for. EC 2 support is generally very good, billing support is reasonable if a bit slow. Support for a lot of their managed services, or media services is beyond useless. All the support engineer does is take your complaint and says they will check with the internal service team. Your ticket will just end up in "waiting for Amazon" state for weeks or months.
> BTW the tutorial is absolutely useless. Just a thousand different incantations to repeat. No real understanding communicated.
I share this sentiment after going through a lot of tech tutorials and onboardings. And it's not limited just to AWS. I find myself forgetting most of the information I was supposed to learn. I'm trying to be extra mindful and offer additional explanations when I'm writing procedural guides myself, but I still have a lot to improve.
I understand what you feel completely. I actually wasn't even able to complete the tutorial since the commands given would trigger a permission error one way or another. I got charged $50 as well since I wasn't aware that there's still services running. Such a terrible thing to front as a "Getting Started" guide.
Disclosure that I'm the Co-Founder and CEO of Vantage - and used to work at both AWS and DigitalOcean....but this is a large reason we offer a fairly generous free tier on https://vantage.sh/ to help folks figure out where their costs are coming from to take action.
It can be really frustrating and annoying when you can't hunt down where costs come from.
I'm nitpicking on unimportant details here but this seems like more of a general UI/UX problem than a free tier problem.
AWS and Azure both have pretty straightforward free tiers, but people end up accidentally racking up large bills anyway. A UI to see a list of running services, sorted by cost either doesn't exist or is not prominent enough.
It's slow, you can't open links in new tabs, one wrong click and you have navigate to the right page and wait for everything to refresh again, but you can get the list of billed items. Plus a "Other subscription charges" that contains some mystery costs not linked to anything.
The state of cloud billing seems like exactly the result you'd get if you didn't have a strong mandate that product teams implement a unified, centralized billing interface.
And it makes sense from a growth perspective: new products grow revenue, bad billing systems only annoy customers (but mostly invisibly).
We have services across a lot of cloud providers, Google and Azure for a variety off things, Digital Ocean for some and the main stuff on AWS. I'd love it if a service such as Vantage could track them all, I can only find solutions for AWS.
I feel like a lot of cloud platform tutorials out there should start with having you create a billing alarm that emails you when the bill crosses a reasonable but higher than expected amount. For basic tutorials, that might be something like $20.
> Cloud platforms generally don't let users cap the billing, because those overages are good income for them.
No. They don't have caps because it's really hard to implement and because the negative press of an app going down because the cloud didn't scale is far worse than any received by surprise bills. Scaling is a large chunk of what you're paying for, after all.
This is quite obvious when you look at how lenient AWS is with retracting surprise bills. And the money perspective doesn't make sense, either: AWS is living on customers that have bills in the 5 figure range and up. The occasional 10$ from someone playing around aren't even a drop in the bucket.
You sure? Say your app is reaching the cap. What do you do with ongoing costs? Do you block writes? Shut down VMs? Delete stored data?
So you might say, simply project the cost (with some magic) and prevent that from going over the limit. So, imagine, your app suddenly experiences a load peak and you need to scale up. However, adding a VM would increase that projection too much. Do you not scale, despite this possibily being a small load peak, and let the app go down? Or do you risk the situation described above? Doesn't matter, you'll get bad press either way.
And beyond that, you'd still need to project costs like traffic volume, which can vary extremely. Not to say anything about the technical difficulties of coordinating that billing information across hundreds of services in real time.
And even if you do all that, you still get bad press of the likes of "we forgot to remove our payment limit and it killed our app while being on the front page (and our alert did not trigger because we couldn't afford another mail)".
There's no way AWS (or any other cloud) is eating all these drawbacks just to have a limit. I bet it's orders of magnitude cheaper to just eat the occasional surprise bill.
Does it shut down at $150.00 or does it shut down “some amount of time and unknown dollars after you cross $150”?
I’m willing to bet it’s the latter. If in a normal account, you’d then incurred $152.78 or $166.39, did the limit work? Would customers agree?
My cloud bills continue to change for several days past the end of the month (for legitimate calculations that come in for usage incurred during the month).
We couldn't technically make it stop exactly at $150.00, but only at $167.89 or whatever, so we are letting it run to $15k.
For catastrophic cases it doesn't matter. If it saves a person from an unexpected $15k bill then it works. Even for many businesses it would be ok to drop everything - I know some which can withstand being offline for a day, but not a $250k bill.
Make the MVP opt-in, delay any irreversible stuff by a few months (ex. deleting s3) with a deposit to cover costs, figure out the rest from user feedback? Aka do it like any other new feature is developed in a modern shop
> You sure? Say your app is reaching the cap. What do you do with ongoing costs? Do you block writes? Shut down VMs? Delete stored data?
Two approaches:
A) Hard limits: freeze the services immediately if your cap is reached, ideally by giving a heads up some time beforehand with predictions, if possible; this is what many VPS providers out there do for unpaid bills and such, which makes sense
B) Courtesy: allow the services to keep working, but at a degraded performance level - that's what some of the other VPS providers out there do; for example, decrease disk performance, cap the CPU performance, limit the network speeds etc.; probably eventually also block writes, but don't delete data outright; any of the aforementioned should trigger monitoring alerts on the developers' side and Zabbix or another solution would alert them in minutes, as well as the vendor should also send e-mails about these measures either currently being put into place, or about to be put into place, so that the necessary actions can be taken
> So you might say, simply project the cost (with some magic) and prevent that from going over the limit. So, imagine, your app suddenly experiences a load peak and you need to scale up. However, adding a VM would increase that projection too much. Do you not scale, despite this possibility being a small load peak, and let the app go down? Or do you risk the situation described above?
There's a difference between having the current capacity with a degraded performance during the spike and killing the entire app. You don't always need to scale up, depending on your failure modes. Having consistent service response times is overrated, as is needing to serve every single request without ever telling a small portion of your users that your service is experiencing high load - there should be solutions in place to deal with the backpressure and prevent data loss even under these circumstances anyways.
Unless you work in a Governmental organization or another critical piece of software for society, degraded performance is probably okay and no one feasibly cares or remembers even small outages - regardless of whether it's large sites, or small non profits or even side projects. Whereas if you do, then you probably have enough money to throw around for billing caps to not be relevant.
If you subscribe to those beliefs about always needing to be up and serve requests, however, then there's another option:
C) Billing alerts: something that most of the providers out there already provide in some capacity, however in fairly bad ways; if AWS can bill you for Lambda functions on a 1ms basis, then there's no excuse for not receiving billing alerts the very instant when this spike first happens: https://aws.amazon.com/about-aws/whats-new/2020/12/aws-lambda-changes-duration-billing-granularity-from-100ms-to-1ms/
Better yet, allow your clients to choose which of those mechanisms they desire to use, in the order of the potentially least expensive (infrastructure wise) to the most: A, B or C. That way the little guys for whom a 10k bill would be life ruining could just use A, whereas startups could stick with B and huge corporations who have a large runway of cash to burn could use C.
> Doesn't matter, you'll get bad press either way.
Bad press? As opposed to what, going broke and not being able to pay your rent because of unpredictably large bills with no way to limit them, just because your side project got popular on Reddit or Hacker News?
There's a world of difference between what's needed by corporations and what's feasible for private individuals, so for as long as there's a chance of such bills, i will not use Azure, AWS, GCP or any other platform like that.
Remember: these surprise bills will only be "eaten" by the larger providers based on their own goodwill. There's not much preventing them from banning you outright.
Wow, it's really interesting to read how clueless folks are here on HN.
The reality is kortilla is the same person that if AWS deletes all their data when they hit their "cap" to stop the billing will be on here complaining.
Payment method not go through? Hit your cap? To stop charges AWS needs to delete forever almost everything in your account. All your S3 data gone. All your backups / databases and archives gone.
Oh - you actually DON'T want them to blow up your platform? Maybe they could provide instead a billing console
Some folks seem to have almost no clue about what AWS customers who pay the billions want. Is there ANY chance that AWS listens to its paying customers? Maybe has become successful by doing so (at the cost of total feature sprawl in my view?).
My quick trick is to login a day or two after I think things are shut down and look at projected bill and current month billing. I've left some very large instances running a time or two, easy to turn off.
And with billions of requests (PER SECOND) on the aws network, there is NO WAY they are doing real time billing. That is not happening. Look for daily aggregation and similar. Just the scale of permissioning on API calls must be insane per day. These are going to need to be doing local counters that aggregate periodically.
I do wish they'd maybe aggregate 4x per day (6 hours).
They should go talk to Azure then, because Azure for Students gives you $100 per year. The second you hit that $100, it kills everything.
I’m not saying it would be simple, but it wouldn’t be a bad idea to have a global monthly billing maximum where it’ll nuke the account at that number. There’s lots of new developers that are probably too scared of the free tier to use it without something like that (I have AWS experience, and I don’t use it for personal stuff specifically because that doesn’t exist. Which means I probably won’t tell my employer to use it either).
I do like this idea, but for their bigger customers a cap where billing (and services) all stop is not what customers are asking for. Instead they are asking for durability / resilience / object locks etc.
I'm serious, what large businesses wants to lose EVERYTHING (all static IP's, all glacier and S3 data, all database and compute) over a billing issue.
You have to see how that's both totally different and a terrible idea for a business, right? If anything the existence of hard limited student accounts combined with the fact that support always* refunds giant surprise bills underscores the point that caps for business accounts don't exist because of the damage they could do.
Surely it's the customer's role to decide whether or not it is a bad idea?
How 'bout you run a non-profit and have an allocation to run the services, would you prefer your nonprofit to lose the website for a couple days, or for it to go bankrupt?
What if you run a company for which online presence is a means of advertising and not the revenue-generating platform, would you prefer to run your advertising campaigns on a budget or without limits?
This seems like a strawman. You could almost definitely ask/warn the user when they set the cap, or you could make the cap not apply to anything that is not trivially recoverable.
They have this with their cost explorers, alarms, budgets and more.
Realize that most customers are more focused on will their data be preserved.
Blowing out your entire EC2 / RDS / S3 / Glacier backup stack over a billing issue (or someone setting a cap up in the accounting department) makes no sense.
Are major customers really asking for this? Why risk it, why even build a tool that can blow out a customers setup so completely.
This is why I don't understand HN sometime saying AWS is "BS" etc. Does HN not thing AWS talks to their big customers to find out what they want?
I'm convinced folks who complain about "unexpected recurring small bills on AWS" have never taken the five minutes it takes to learn to using the billing console.
It's like asking why the knife you're using keeps cutting you when you put your finger on the sharp end. Learn to use your tools, and they won't surprise you. Learn to track your costs, and you won't have unexplainable recurring bills.
> five minutes it takes to learn to using the billing console.
I spent hour or two trying to find anything running and didn't find a damn thing. Yet Amazon decided to charge me a buck or two every month for "storage" so I just canceled whole account. I mean, if I can't find what I am paying for when not using it, how am I supposed to understand the bill when I actually have a dozen of instances?
That's a good comparison. Have you ever cut yourself with a knife, any time in your life? I certainly have. I've also had a £150 AWS charge I had to pay.
In both cases I learnt my lesson. I'm careful not to put my fingers where a knife can cut it, and to not put my bank details where AWS can charge them.
I know Google burns through user trust like it’s incense candles, and so people fear of losing their account access or features being deprecated without notice, but, I really think GCP is underrated. In GCP, it’s easy and best practice to group things into granular projects, and if you want to, you can shut down an entire project all at once. The billing story isn’t perfect, but it’s not bad either, and it’s not the only thing going for it.
I know I’m inviting replies airing grievances with GCP, and there certainly are many, but I’ve come to really like what it offers. Especially, GKE is really cool and at least to me feels very whole-assed as far as Kubernetes offerings go. Obviously that would make sense, but still, it really is nice to use.
It's the same in Azure, really. It forces you to setup a resource group for everything (can't have any resources outside a resource group). If you're trying something out - just put them all in same resource group, and at the end delete the resource group.
That billing model makes it very easy to find out what the operational cost of a particular product/service/team is costing the business and I think it's one of the advantages that GCP has, personally.
It took me 2 years to get AWS to stop billing me a few cents a month. I didn't try that hard but every month or so I'd get another bill for like 23 cents. I'd login and try to figure out what was still turned on.
AWS has been billing me a few cents per month for over a decade now. Eventually, the credit card I was using expired, and so every month for the past 7 years, they've sent me an email:
> Dear Amazon Web Services Customer,
> Your AWS Account is about to be suspended. There is still an outstanding payment problem with your account, as our previous communications did not lead to successful payment of your past due AWS charges.
> We were unable to charge your credit card for the amount of $0.26 for your use of AWS services during the month of Aug-2021. We will attempt to collect this amount again. Unless we are successful in collecting the balance of $0.26 in full by 09/30/2021, your Amazon Web Services account may be suspended or terminated.
The balance never increases, even though it always says the charges are from the previous month. It's as though they forget the balance each month because it's not worth collecting, then my account runs up another small charge, which they promptly forget.
And of course they have yet to terminate the account. I would gladly close it myself, but I'm locked out, and it's not really worth my time to figure it out how to get back in (plus they keep telling me they're going to terminate it, which is exactly what I want them to do!).
I got this too, and they did eventually terminating my account. I missed all the emails tho, but when I went to use AWS for some project I found out I was terminated and the only way for me to use AWS again was to create a new account (I emailed them, they can't un terminate me)
i have this happening on gcp and aws. i get a text from my credit card each month for like 10-25 cents. i’ve gone in multiple times all the consoles show $0 it’s impossible to track down. it’s so cheap i haven’t cared enough to dig more
I never use AWS or Azure privately. Everything I test I do on company or customer credit. You also more or less have to believe what they say you used in CPU time, storage is a bit easier to check.
But even with a simple webservice you quickly get to 50 - 100$ alone from testing and deployment.
I heavily recommend to rent a server for private use. They are always the cheaper option at fixed costs. Of course you don't have those fancy services... I only use a virtual server right now and pay $45 quarterly with domain. It still does have quite a lot of power though.
I'm a huge fan of Oracle's (yeah, I know right) free tier for especially this -- you can't accidentally use paid stuff without manually toggling an upgrade. It's such a relief to know that you can mess around with things and not be charged for it.
In Azure I put all my stuff in a new resource group and then when I'm done I just delete the entire resource group. This has worked well for me so far and I haven't had any surprise charges like I did on AWS and Digital Ocean.
I wanted to clear out my account so I deleted the VM but it didn't delete the associated static IP, so I got charged for the unused IP address that month. I didn't know the IP was still around until I got the bill. If this were in Azure I would have deleted the entire resource group and the IP would have gone along with it.
Just one thing: credit card. Once they have all details, they can do whatever they want with you. They can let you go - but that's only their good will.
Personally I'm in love with Hetzner cloud. It has less "features" (=proprietary complexity) than AWS/Azure, but I can understand everything perfectly and, most importantly, I have a guarantee I won't be charged over a limit.
I had a $7k bill from AWS due to a bug in a SaaS product. They wouldn’t refund it. I wouldn’t use them again after that.
I then tried to close down another account based on AWS organisations, and it was super complicated to get it all removed to stop billing to the extent I would have needed to hire an experienced consultant to do it rather than just click “close account.”
I'm glad they have made progress in that area. As others have pointed out it can sometimes be tough to track down sources of costs in AWS as well (especially if you accidentally started something in a zone you don't normally play in). I'm still gunshy though.
AWS have a setting in their billing that can fire off if you have exceeded a certain threshold which would be pretty much invaluable if you're running something that auto-scales.
>They keep reminding me I have $50 of test/dev credit on Azure through my Visual Studio subscription but it flat out frightens me to even try to use it.
The VS credits are hard capped. At least they were on my account (didn't even have a credit card loaded)
You hear about all of these horror stories about services from IaaS/PaaS/SaaS providers essentially being black holes for money with no way to actually set hard limits for billing, even after all of these years and can't help but to think about more reasonable alternatives.
Providers that just give you VPSes that you can run whatever containers on (or just host things the old fashioned way) and basically use whatever open source software that your project needs. More importantly, providers that give you predictable billing at a flat figure per month (or less, if your VPS isn't active all of the month) and simply slow down your network connection if the set limits are exceeded.
If you're just doing something to practice and haven't sold out to SaaSS (https://www.gnu.org/philosophy/who-does-that-server-really-s...), you probably really don't need to worry about scaling just yet - you can migrate over to AWS, GCP, Azure, or any other of the large providers at any time, by just running your containers on their scalable infrastructure, if there's even any point in doing that, since the smaller providers also can scale similarly in most cases.
Lastly, it just feels dangerous to give AWS, GCP, Azure or any other entity that's known to give people insane bills your personal details and personal credit/debit card details - what if they decide to block you because you can't pay and your complaint doesn't become popular on Reddit or HackerNews? It almost feels like setting up a shell company and using limited virtual credit cards would be more reasonable, same as people always say that you should have a separate Google account for your personal needs and anything in any professional capacity, so nothing gets blanket banned.
Sadly, there aren't many tutorials that start with: "Here's how you set up a company that's detached from your personal details, and here's how to easily make one credit card per vendor." Honestly, even the internal workings of companies like https://privacy.com/ are unclear to me.
For running personal projects that don’t need to scale instantly to demand you are 100% right that you should not go with cloud providers. Hell, most startups shouldn’t either.
> "AWS isn't perfect but at least you have half a chance of working out what is costing you money through the billing console."
Same thing happened to me on AWS. I was trying to get a Windows VM in the cloud to run some CAD software. Ended up blowing through $500 credit in a week and shut it down before it went through the credit.
Unexpected bills like that makes for a terrible experience to be honest...
I've had a $1.5/month charge on AWS that I tried repeatedly to track down but failed each time. Eventually I just let it run until I canceled my credit card.
I have a funny story about Azure. Our company was looking for a cloud provider, and Microsoft sent a couple of salespeople to talk to us about theirs. Their salespeople were very condescending, talking like they'd be doing us a favor by allowing us to become customers and failing to take any of our questions and concerns seriously.
I think it's because we aren't a large corporation in terms of headcount.
In any case, that meeting ensured that no further consideration of Azure would take place, and it's very unlikely that it would be considered in the future.
When you have an existing stack, the choice is often between adding Cloud or staying on managed/on premise, and not between Azure and AWS/GCP.
As in the article, if you already run Windows servers at scale, going to AWS is possible but won't be your first choice. Microsoft sales people being condescending will basically reflect that situation.
In most other situation I can think of you'll go to Azure only if you can't use the alternatives, so again you're basically at their mercy.
To digress, if you run linux servers with an open source stack, going to Azure will bring you virtually nothing, and will probably be an expensive PITA at every step. In a previous ruby shop we had more empathic dev evangelists walk us through Azure, but we felt like wasting their time as it clearly wasn't a priority to them. Then looking at the price we'd need to drop them as soon the discount prices expire, so it was just a losing proposition for everyone involved. I totally understand how more sales focused employees would weed out shops like us and go look for those who have to stick with them anyway.
> As in the article, if you already run Windows servers at scale
My company does that. We have started moving new things to the cloud, but we took one look at the Windows pricing in the cloud and ran away. And my company is DEEP in Windows.
So I’m not sure Azure really has much of a leg up there. I think the only real benefit they have is their hybrid cloud for people who do use Windows in the cloud as well.
Looking for a k8s for a hybrid deployment, the contrast was blatant: mention it to a Red Hat sales person and you have people showing up the next week to pitch OpenShift. AWS has a deep story here, same deal. A friend stopped begging the Google rep to pitch Anthos because it was kind of obvious how it'd go if you actually needed support.
This has been my experience. I've dealt with all of the big three cloud providers, and as a big customer. Azure is incompetence, AWS is largely capable and professional (so long as you're big enough to register on their radar), and GCP is just dripping arrogance. GCP is a better platform than Azure, but I would never choose it because dealing with Google reps is a never ending stream of condescension.
Dodged a bullet there, I'd say. You can safely assume that google will EOL anything in GCP that you might actually start relying on for business function.
I don't fully agree with parent, and GCP is differently managed compared to other Google products.
That said, the k8s version deprecation policies, mixed with significant changes in cluster setup from times to times you have to keep up with is peculiar. I kind of think this is something we accept when going into k8s, but I'd understand people not at ease with that philosophy.
Many salespeople repeatedly visited my previous company, a megacorp. While attendees from our side rarely had any authority to purchase (e.g. engineers not managers), the visitors were extremely enthusiastic in their presentations and demos. I personally felt bad that their efforts wouldn't lead to sales, but they helped boost our self-esteem for the day.
A moral hazard with all cloud providers is that their PaaS services are typically billed on consumption.
So what incentive do you image they have to make those services efficient?
My favourite example is Log Analytics. It can easily cost up to 20% of the virtual machines it is monitoring! If you have a very heavily loaded website and you're logging every HTTP request, it can exceed the cost of the service it is monitoring.
They charge you a ludicrous $2,380 per terabyte of ingested data. This is 20x the cost of the underlying storage, even if it's Premium SSD! For comparison, AWS charges just $500, which is still overpriced.
Now consider: If you're an Azure software developer and you find a way to reduce the bloat in the log data stream format, what do you think the chances are of getting that approved with management?
They have a firehose spewing money in their cloud. I can't imagine them ever saying: "I think it's a good idea to turn that down to a mere trickle!"
As others have pointed out, all of their other services have similar moral hazards: Bastion, NAT Gateways, Private Endpoints, Backup, Snapshots, etc...
Even though it takes some ops-skill to setup, this
> They charge you a ludicrous $2,380 per terabyte of ingested data. This is 20x the cost of the underlying storage, even if it's Premium SSD! For comparison, AWS charges just $500, which is still overpriced.
just makes me happy about our monitoring cluster on the german hoster Hetzner. The old systems are at 40 Euros / month for 900GB storage and the upgraded ones are 40 Euros for 1.3TB / month. There's some manpower per month in there, and some egress costs, but it's still very cheap.
It's funny this is coming up now. Just yesterday in GCP I was trying to figure out our billing and looking at what was costing such a huge amount. I couldn't figure out any way to map the actual service being used to the price on the normal reporting or even on the billing cost table export. The only way I could figure out how to do it was to enable log export. They used to have an option to download that as a file. They disabled that a while ago and now it's only available as a bigquery export. Which is exported every day. I was like "Why would that they do that ?" Oh cause now I have to set up big query and pay for all that. So I have to pay extra JUST TO SEE my detailed billing information. Pretty ridiculous.
We really should revolt against this. I should be able to have a view of all of my billing without having to pay extra. It also shouldn't be hidden behind a bigquery export, it should be easy view what is being spent and what is causing it.
I used to work in GCP. The billing report UI in the billing account section shows per SKU usage. While detailed breakdown can be a nontrivial monster. Some customer may just launch thousands of VMs or data processing jobs per day.
Right that's correct. It shows SKU usage but not mapped to actual instance id. And we have the scenario you are talking about - lots of same sku with variable cost and no way to correlate it without using bigquery it seems
For all of the faults of Azure, they let you do this reporting directly in the Portal. You can slice and dice the data without having to spin up infrastructure.
Same story with Cosmos. On an IoT pipeline I set up with event hub, Cosmos needed $15k/month to keep up with the flow without generating 429s (i.e., causing the ingestion function to drop events). The RUs shouldn't have needed to be that high, but due to upstream providers there was a regular spike every five minutes that would exceed the average RU need. RUs are a hard per second cap; there's no average or windowing. I had to set my RU consumption cost at a level that guaranteed 40% unused capacity.
So I tried setting up the same ingestion database with Kafka Connect and Mongo on a $200/month VM. It worked flawlessly, and Azure helpfully suggested I downsize that VM because it was underutilized based on CPU statistics.
What incentive do the Cosmos engineers have make it more efficient, or to make the RU pricing model more reflective of actual usage? Zero. It's a money hose. Why would you turn that off?
I saw CosmosDB turn up in some recommended multi-region designs, and I had a customer with DR requirements so I looked into it.
I started by spinning up a small one in my lab but when I saw the pricing I back-pedalled very, very fast. Deleted the whole Resource Group and never looked into it again.
Would it not be possible for you to stream the data to a data lake and then take it from there to either do bulk inserts, or smooth out the inserts at a predictable rate to remove the peaks?
There were a variety of ways for me to do so by invoking more Azure services, but I stopped trying at that point, because even after smoothing it out, Cosmos would still be 50x as expensive as a basic VM running Mongo.
But also, every time I start stringing together cloud services, I experience two things: first, exploding complexity because now I'm adding points of failure, integrations, transformations, to keep it all running; and second, this sense of "the whole point of the cloud is to simplify things, to offer canned services and features that save me the trouble of doing this in code for myself." Once I'm using cloud features to work around cloud limitations, I bail out because if I'm going to spend that time (and money), I'm going to get the benefits of something much more direct.
internally it will probably be approved, for example I am sure that Google drive applies basic compression and deduplication to uploaded files, but if I upload 10 files of 10GB of zeros they are gonna count 100GB, not the few MB they are actually writing to disk
(there are good reasons for this, but still declared consumption is different from internal consumption)
Log Analytics uses a columnar compression format on-disk, so ingested data is likely compressed by anywhere between 10:1 and 100:1, maybe even higher.
However, the wire format is super verbose JSON.
They bill per GB of the latter, not the former.
To put things in perspective: How many $ of CPU time do you imagine it takes to column-compress 1 TB of data? I would estimate that a single modern CPU core could do this in a minute or so. Factor in various inefficiencies and make it a super generous 1 hour. At spot pricing, that's about $0.01! One cent!!!
The larger cost would be bandwidth. Azure charges a huge markup for traffic (just like AWS), so for example zone-to-zone data costs $10 per terabyte at retail pricing (not internal costing).
They store that data for 30 days "for free" (lol). Assume a worst-case compression ratio of 10:1 and then that means that they have to retain 100 GB for 30 days. That's $9.43 for a Premium SSD at retail pricing.
So their hosting costs for Log Analytics is something like $20 per TB ingested, but they charge well over $2000 for it.
That 100:1 markup is pretty sweet if your KPIs are based on recurring revenue.
There is no way in hell they will ever "optimise" this. Any accidental improvement will be rolled back or "adjusted" to ensure the revenue stream doesn't fall off a cliff.
Have you not wondered why it's taken them so long -- over ten years -- to enable any feature to filter logs at the source?
I was all in on Azure for years, but once my BizSpark wore off and died I was shocked at how expensive it was to run the most trivial side project or seedling of an idea or non-money making project.
As a .NET full stack web dev for like 14 years, I finally decided to put all my new learnings into just doing static sites on React and trying to figure out some AWS stuff because I wasn't going to try and use .NET on AWS when I had it so fairly figured out on Azure.
For a serial entrepreneur and maker, it just couldn't cut it anymore. Now I do NextJS on Vercel with minimal extra services out of what they provide and I get way faster stuff for free pretty much and I guess I'm no longer the only .NET guy struggling to hold the fort in a big tech community that also thinks .NET is too old or boring or non-sexy.
I still do like Azure better than AWS. The stupid, weird UX is still nicer than AWS. The docs by MS are 1000x better than everything on AWS. I miss MSSQL and the SQL Server Management Studio, but I don't miss the cost for scaling it enough to actually use it for scraping or data processing.
I tried to sell friends on Azure and even got a part time gig from MS themselves to try and help local startups use it, but no one cared or was interested. It just doesn't have that same "standard" or "sexyness" or "built into every new tech" feel to it, so I doubt it'll ever really change or pull ahead in comparison.
I've never used Azure and always use .NET on AWS and have been for the past 5 years. I haven't hit any problem on it so far. .NET is not tied to a specific cloud provider.
.NET is not tied to cloud at all. https://dotnet.microsoft.com/ lists Web, Mobile, Desktop, Microservices, Cloud, Machine Learning, Game Development, Internet of Things.
> Every cloud provider has their expensive “thing”. Ingress is always cheap, egress always expensive. AWS has their “Managed NAT Gateway”, after all, the sit-in-the-corner money printer that never fails.
This is my pet peeve with cloud providers. Each one of them seem to have a gotcha somewhere hidden.
It's very hard to compare what your final costs will be early in the project. You try to compare GCloud, AWS and Azure on VM, ingress, egress. After that comparisons become harder as services don't necessarily map 1:1. You end up choosing one and always find something else that adds to the cost you forgot to include, or maybe you just underestimated some metric.
Egress feels abusive pretty much across the board. Without really a good reason. Feels like they all sat at a table and decided to fix the price there.
While you are developing your business you find you want to use some feature (like managed VMs on Azure's case) that is priced way out of a reasonable amount. You feel robbed, maybe you can still pay for it with your budget, but even then it leaves a bad taste, like you are getting a bad deal.
> Egress feels abusive pretty much across the board. Without really a good reason. Feels like they all sat at a table and decided to fix the price there
Indeed. You can significantly lower your price for hosting static files by using an external CDN in front of S3 or GCS.
Little known fact: egress from GCS to Cloudflare is half the price than their usual egress fees. So combining the Cloudflare CDN for caching static files with this egress discount can lead to a 3-4x saving over just serving files out of GCS directly.
And this is the actual problem. Azure doesn't compete on quality. Microsoft rarely does. They have a few products and a market position that make them the default choice for countless customers. Nobody punishes them for their failings as long as the feature boxes keep being checked, and so the cycle continues.
A previous employer switched everything over to Azure and it was nothing but constant problems and nobody liked it except for the CEO, which meant everyone had to put up with it and nobody had a say.
Teams and Azure DevOps are some of the worst software I've ever used in my life. I've used worse software before, but it was hobbyist stuff written by single developers, and therefore don't really compare fairly.
At a previous company I used slack and really liked it. At this new company I joined recently they use Teams and it felt like a ghost town. Nobody online and nobody communicating in public.
The reason is that it’s impossible with the desktop app to browse channels that you’re not a member of, whereas on mobile you can! When I mentioned it to my colleagues they were shocked. This whole time they could have been communicating in other channels and they had no idea they existed. I have no idea if this is a bug, or some kind of admin setting.
There are plenty of other issues with DevOps too. You have to buy in to the whole Microsoft package apparently, and none of the parts are best in class.
Azure Pipelines is just slow. I'm not sure even how to fix it.
Checkout stage of our code from Azure Repos easily count for half a minute. Npm install goes for a 1.5 minute with npm cache hit. Total build times are around 20 minutes...
there is an undocumented feature, called zip deploy, if you deploy to an app service. Give it a go, it might just work. I use it to deploy a Python Django website. You need to add a setting in your App Service environment variables.
It took me an escalation from our account manager, to get a good support guy, who informed me of this functionality.
Here is the summary of my support ticket:
WEBSITE_RUN_FROM_PACKAGE” and value as 1.
We performed deployment and it took around 30 secs. We also verified that the app is working fine.
Now you will do some changes and deploy once more to verify end to end pipeline.
Zip deployment is a feature of Azure App Service that lets you deploy your function app project to the wwwroot directory. The project is packaged as a .zip deployment file. The same APIs can be used to deploy your package to the d:\home\data\SitePackages folder. With the WEBSITE_RUN_FROM_PACKAGE app setting value of 1, the zip deployment APIs copy your package to the d:\home\data\SitePackages folder instead of extracting the files to d:\home\site\wwwroot. It also creates the packagename.txt file. After a restart, the package is mounted to wwwroot as a read-only filesystem.
Microsoft is hardly unique in that respect. All three of the big cloud providers are like that in some respect, as well as a lot of "enterprise" software. If you have a big enough moat to keep your customers from leaving, there isn't a lot of incentive to improve the quality.
I sat in a market focus group for IT managers. We were universal in our rankings AWS > AZure > Google for cloud services. We also agreed the Google was #1 or #2 technologically, but no one trusted them to be there day after day for the boring stuff
We use Azure at work and this article hit home hard. As we just have been burned recently by Azure pricing. I am in the same spot as the author: liking Azure but being put off by all the weird stuff sometimes they are doing.
In our case, all we wanted was a static IP in front of an Azure Container Instance. Easy right? Let's put the container in a vNET, place a NAT Gateway in front of it and we are done. However, for some reason NAT Gateway is not supported for Container instances, instead, the official documentation suggests setting up an Azure managed firewall in front of your container that starts at a whopping 600EUR/month. That is a steep price increase from your ~30ish EUR/month for a basic container instance and it doesn't seem to be any other official alternative.
I have opened an issue with the docs team [1] about it and I hope there is another way of doing this that doesn't incur a doubling of our Azure monthly spending.
I don't think azure-docs repo is the right place to ask for help/suggestions as maintainers are not very responsive because their sole job is to push internal docs to public docs. But I understand your frustration.
However, I believe you could have set up "public IP prefix" using azure cli. I do not think you needs a azure managed firewall.
Adding managed firewall just to have edge IP is like saying I want to add a outside patio to my house, sure let's add a security check point for the neighborhood first.
This varies immensely by the product team. If you file a bug on the AAD protocol docs I have a self-enforced SLA of a business day (or less). And a CVP enforced SLA to solve it in 30 days. And I generally love the folks who do file bugs - they're engineers, and usually fairly savvy.
Other teams do get burnt out on docs though, especially when customers use them for abuse or free architecture help. My favorite was someone asking me how to use an Oracle product. I know our branding is confusing but it's not that bad... Is it?
> I don't think azure-docs repo is the right place to ask for help/suggestions as maintainers are not very responsive because their sole job is to push internal docs to public docs.
What place would you suggest? We had bad experience with Azure support we could never fight through on the first support line.
> However, I believe you could have set up "public IP prefix" using azure cli. I do not think you needs a azure managed firewall.
I don't have deep experience in networking stuff on Azure so my understanding can be wrong, but I think "public ip prefix" is just a group of continuous IP addresses what you can reserve. You still need to assign those to something eg a NAT Gateway. As far as I know you cannot assign them directly to an Container Instance.
>> I don't think azure-docs repo is the right place to ask for help/suggestions
This is correct -- The azure docs repo feedback mechanism (using GitHub issues) is primarily for providing feedback on the documentation itself. We try to make this clear via the buttons on the bottom of the page; one is for 'Product Feedback' and the other is for 'Feedback about this page'. I would agree that the distinctions can be blurry, but I see the three categories as:
- Product Support: I need help with a product
- Product Feedback: Product A is missing feature B, and I want you to add it
- Documentation Feedback: The documentation is unclear, has a typo, or the example provided no longer works
For Product support, your best bet is to go through the standard support channel. I'm sorry that you didn't get a better response when you tried contacting support. Do you have paid support? If you're a large customer, you may get a dedicated customer support account manager.
Additionally, there are community forums including https://docs.microsoft.com/en-us/answers/topics/azure-contai... and https://techcommunity.microsoft.com/t5/azure-compute/bd-p/Co... , which can also be used to submit product feedback.
Dear lord, this hits too close home, I've been having nightmares maintaining some Azure infra lately. I'm not a cloud-provider fanboy, they all suck at the end of day, but Azure is the one that deliberately makes my life worse every day.
All those features and no decent integration between them, unless you're a multi billion 100k employee company you'll have no luck with their customer support either.
I had an Azure employee troubleshoot my PG db instance for hours with me, for free, while our total spend was something like $100 a month.
Now, this employee didn't really help, but they were obviously professional and had database experience and didn't act condescending / like they were doing us a favor at all.
They just worked through the issue with me, which was a very pleasant surprise.
> I hope there is another way of doing this that doesn't incur a doubling of our Azure monthly spending
Oracle Cloud Infrastructure provides NAT gateways for free. You pay (low) transit costs, but unlike AWS+Azure (idk about Google) the NAT gateway itself costs nothing, so you don't pay twice for NAT traffic.
All the traffic at these cloud operations gets handled by cloud scale SDN systems. I suspect the actual cost of the few tens of bytes necessary to track a NAT connection is lost in the noise of such platforms. So to my mind the high cost of some of these cloud operator's NAT gateways seems abusive.
Fortunately there is indeed competition that accommodates my view.
Could you not attach it to a subnet, and attach the subnet to a network security group, and then do what's needed in the network security group? Maybe there are regional restrictions that I'm unaware of.
Edit: oh, no, you just need a public ip prefix/address, right?
I'm surprised Azure still doesn't have any ARM processors to compete with AWS' Graviton instances. It's been nearly a full year since rumors of Azure working on ARM chips (https://www.zdnet.com/article/microsoft-is-designing-its-own...), and going back further it's close to 5 years since they talked up Windows Server on ARM (https://www.techrepublic.com/article/2-years-later-theres-st...) on Azure. Where is any of it though? I can go to AWS right now and spin up multiple different types of ARM processor instances, most of which are cheaper and more efficient for web-like workloads. It's really surprising that Azure hasn't been able to get anything out in all these years.
It's very possible that there just isn't much demand. This is likely for a number of reasons:
- x86 just has better support for the stuff Azure's "enterprise" customers want
- ARM servers are often more expensive to spin up than an equivalently specced x86 option
- PRISM compliance is easier on x86 (half-joking, half not)
I like ARM, and I owned a Rev1 Raspberry Pi when those were cool. But even now, ARM still has yet to make a strong case for existing on the server. And that's before we even discuss architectures like RISC-V that are out on the horizon, much better suited for servers than ARM. I'm not planning on an "ARM revolution" taking place in the next decade unless x86 is critically compromised in some way.
> But even now, ARM still has yet to make a strong case for existing on the server
This is several years out of date: AWS Graviton instances are usually a fairly substantial savings over similar Intel, with AMD in between, and Cloudflare has been reporting rather good numbers as well:
The main reason I suspect Azure doesn't have it is both Windows' legacy x86 hyper-focus (the days where NT ran on half a dozen platforms never really panned out) and a smaller number of managed services. AWS has very popular services like RDS, ElastiCache, ElasticSearch/OpenSearch, etc. where you can simply check a box and wait a couple minutes to see savings, not to mention things like Lambda being only slightly more work for many users, and that's a great way to get volume usage even if the average enterprise IT department is scared to go near it for VMs.
It's not obvious whether AWS internal operation cost of Graviton instance is cheaper than Intel/AMD instance. I believe it's cheaper in AWS scale, but I also think they tactically reduce profit margin for Graviton instances, to negotiate Intel/AMD to cut down price.
Do you mean end user as someone other than the buyer of a cloud service? That’s the context I was writing in - I’ve generally found things like https://aws.amazon.com/about-aws/whats-new/2020/10/achieve-u... to be accurate as long as you’re not radically switching instance types.
There have been multiple consumer ARM Windows products on the market for the last 9 years, I'd say Windows (client, at least) on ARM64 is as solid as its x64 counterpart
I’m aware, but everyone I know who had one of those ended up complaining about software compatibility issues. In the context of Azure, I’d imagine a lot of their customers would be especially risk averse in this regard.
Most of the time they're complaining about not being able to use x86 (in Windows RT's case) or x64 (until recently) pre-existing applications, and there's no native ARM/ARM64 version of said app. In the context of Azure, if someone wants to deploy on an ARM instance, I'd expect them to be able to build a native ARM64 version of whatever they're building.
> In the context of Azure, if someone wants to deploy on an ARM instance, I'd expect them to be able to build a native ARM64 version of whatever they're building.
Have you really had the experience that a large organization has the ability to recompile everything they run? Most will have a lot of code which is provided as binaries by a vendor, and their in-house code almost certainly has dependencies and optimizations which will need to be dealt with & revalidated. No, none of that is unsolvable but it means adoption is much harder than, say, changing an RDS instance type and you'd be taking on all of the support rather than the cloud provider's much larger team.
That's what I referred to in my original comment — in my experience, the average Azure user works at a Windows-heavy enterprise IT shop where those issues would be common. That doesn't mean that I don't expect ARM servers to happen there — Microsoft announced it was coming years ago, after all — but that it's going to be slow since the upfront investment will likely have slower adoption.
Microsoft themselves said the majority of VMs on Azure run Linux, not Windows.
About the ability to recompile, if you can't build ARM software what's the point of using ARM instances? You don't need cross-architecture compatibility like you need it on the desktop version, which is what consumer complain about when talking about Windows on ARM
The point is that it doesn’t matter if you could see a 20% price performance boost if your code doesn’t run on that architecture. If you’re using software which hasn’t been compiled for ARM, you’re not asking your cloud provider for that architecture and they’re not seeing the volume needed to profitably offer it.
I think AWS has successfully been pushing this because they know they’ll see that initial volume from people seeking savings on their own managed services and things like Lambda which are easy to switch, and that will fuel interest in switching other services which require more work.
Can you clarify point two? I was under the impression that the wide consensus was for Graviton instances to have significantly better price performance for most workloads. And the current state of arm support is surprisingly good for opensource or linux based server software.
We use Microsoft Azure AD B2C to manage users. AD itself is a nice (for enterprise software), is used all over the place and is pretty stable.
B2C on the other hand is a different story. Every few months we have to roll out a new tenant in our system. Tenants are identified by B2C "applications". Every single time, the new application doesn't work. Every single time the fix involves editing the JSON spec, changing something random (like a "true" to a "false"), saving, and changing it right back again.
Doesn't exactly inspire confidence... we're planning a migration to AWS.
Also, try Googling for documentation related to "Microsoft Azure AD B2C". Almost every shred of internet wisdom is related to AD and not B2C. Even with Microsoft's own documentation you sometimes follow a link from a B2C API reference and find yourself in AD-only land and it isn't obvious. This makes the task of researching features and debugging infuriating.
B2C is the worst product we have ever worked with. Outages on constant basis, an extremely complex XML configuration, translation bugs, unsupported features which are only available in AD but not AD B2C (e.g. M2M), and really bad documentation. We basically had to dive in their github examples and issue tracker to make it work. Do yourself a favor and stay away from it.
If you run workloads in Azure, and you let their default agent run on your images, I'd highly consider you take 30 minutes to skim around this repo: `Azure/walinuxagent`.
Go read through some issues, look at some closed ones, try and skim through the source code. Realize there's two enormous python scripts in the repo, one with "2.0" tacked on the end.
If Azure is somehow not just rebooting/killing VMs that lack the magic handshake, I'd highly recommend dropping the agent.
After all of this news, and what is on display in walinuxagent, do you really want some network-connected agent listening, who's often-most-touted feature is being a persistent backdoor?
"It's tough being an Azure fan." Eh, it's a nice cyclic problem. They have no depth of caring about engineering (hence why Azure is littered with services that are impossible to fully utilize because their own engineers don't understand the how/why of what they're building half the time). Which in turn, along with crap career advancement and constant un-appreciated, unmitigated live-service burnout, is why they can't retain actual Linux talent to save their fucking lives.
As a network engineer writing code to automate aws and azure infrastructure I'm amazed at the number of pitfalls we've had with azure, they just keep popping up. Then the whole basic/standard options for every service. Feels like an order of magnitude higher in complexity for no reason, part of it is the abysmal documentation.
- Customer asks for something trivial to be deployed. I say, "no problem" and start beavering away on a Bicep template or whatever to deploy their stuff.
- I hit some small but showstopper issue with a service that I expected to work, but it doesn't. I open a support ticket.
- Inevitably, it turns out to be some stupid, stupid limitation caused by unfathomable laziness of the Azure developers. No workaround, no mitigation, but we're "working on it" with no ETA offered.
- Literally years later a trivial fix for the glaring issue goes into "PREVIEW" for 9 months, long after the original project was closed. I no longer care...
Networking is especially bad, with endless limitations that make no sense, like:
IPv6 is incompatible with everything. Turning it on for anything anywhere in a vNET will permanently block unrelated features like IPv4 NAT.
But of course, you can't "protocol translate" from IPv6 on the outside to IPv4 on the inside, so you end up painted into a corner.
No bring-your-own-subnet, which means many lift & shift scenarios are impossible (we have customers using a class B pubic range internally).
Azure forces NAT on IPv6, which makes no sense at all.
All Azure PaaS services have firewalls that are IPv4 only.
The built-in firewalls (e.g.: Azure SQL Database) do not support service tags, only CIDRs.
Yes, seemingly trivial changes turns into huge and disruptive changes. Latest one we encountered was that you cannot add more CIDR ranges to a VNET without removing all VNET peering first. The feature for this has been in preview for like 2 years. I mean, AWS doesn't automatically handle this for you, but at least you can solve it easily enough (just use the API to create routes on the other side (their API/boto3 library is just amazing)).
Availability Sets decrease availability, because they force "all or nothing" deallocate-reallocate cycles instead of one-at-a-time changes.
You can't move a VM to a different subscription without also moving the entire vNet along with it. (I bet this worked great in the developer's lab with one test VM.)
You can't move a VM to a different backup resource without deleting all backups, going back years.
VMs can't be rolled back to VM snapshots.
ExpressRoute bandwidth can be increased non-disruptively, but the only way to decrease it is to recreate it -- incurring a ~30 minute outage.
The Activity Log (and most other audit logs) don't log the identity of the administrator that triggered the action about 10-50% of the time, depending on what kind of activities are going on. This makes it 100% useless as an audit log. There is no other way to obtain audits.
Most network resources are Zone Redundant, except for NAT Gateway, which is now required in some scenarios. It's Zonal only, which means you need 3 subnets, one for each zone.
Upgrading an internal load balancer from Basic to Standard cuts off Internet access, for "reasons". I'm told these are "security reasons". Uh-huh...
App Gateway doesn't use a "user-agent" header for its monitors, which makes it incompatible with a surprisingly wide range of CotS software. This includes a bunch of Microsoft Software.
> Inevitably, it turns out to be some stupid, stupid limitation caused by unfathomable laziness of the Azure developers. No workaround, no mitigation, but we're "working on it" with no ETA offered.
This is all microsoft products though. There have been .NET Identity bugs that have been open and acknowledged for multiple years.
I find this to be a common theme with Microsoft products. I work heavily with SSIS and MS SQL. The Microsoft docs are so convoluted they are borderline useless for learning the products.
My friend and I have a theory that this is because their docs are written by people so intimately familiar with how MS does things, they can no longer conceive of a situation where someone doesn’t already understand and think in the same way.
It’s like the business equivalent of the “once you understand a monad you can no longer explain it” meme.
It's Microsoft's modern, less-obvious version of EEE.
You sell to the managers and equip them with hollow buzzwords because you know they're gonna override the engineers on every decision anyway. By then, MS has your firm's money and all you can do is deal with it.
I find the problems with Microsoft docs is less how convoluted they are (you kind of get used to the convention at some point - not to say it's good or sensical), it's the fact that Azure documentation is not of the same quality that the WinAPI docs were.
Used to be a function doc would tell you:
1. All the parameters and their types
2. What they did, explained
3. All possible exceptions raised by the function
4. All possible return values
5. Supplementary documentation on the object or structs passed in or passed back
6. Some examples of it in use
Now, the current docs sometimes have examples for Azure CLI/SDK stuff, but there is one convention that drives me bonkers that is as bad now as it ever was.
For examples, often times you'll get the most important part replaced with a [insert your thing here]. The format of the thing you fill in there is often left as an exercise to the reader to intuit or guess.
Writing one good set of docs that are suitable for all users (experts and newbies) is really hard. Structuring complex topics in a logical and progressive manner is also really hard. Having tried to write the docs on a few software projects, I honestly found writing the docs harder than writing the code (though maybe it gets easier with practice).
I find the Azure docs to be decent. They are consistent across services for the most part, so you learn how to work with them after a bit of experience on the platform.
Same here, and not even GCP is that bad with the 'release' vs. 'beta' APIs that intermingle constantly.
Most of the time Azure feels like just another 'we have virtual machines and a crappy API' service. Kinda like a pretend-cloud where a lot of products come almost together but never finish. Almost similar to the way Windows and backwards compatibility means you end up with 100 libraries, frameworks, languages and versions all sitting side-by-side and not really working together, just differently in parallel. Not useful for automation at all, which makes it not useful at scale. (Except when scaling means: we want to run windows VMs with AD and go from 10 to 100 and change no parameters at all)
That reminds me of a relatively old twitter thread (that I can't find of course) about systems that have become operating systems and computers upon themselves.
It went something like this:
Code used to be running on a processor, then it ran on an OS on a processor. Then it became on a runtime on an OS on a processor. Then it an abstraction layer was sandwiched in between. Then a filesystem. Then a compatibility layer. Then a database. Then a browser. Then we went and created a runtime in the browser to run code again. We have come full circle.
This probably applies to SharePoint but also the real-time OS on your GPU, the MSSQL database which has its own OS facilities you can run stuff on. Add too many features and the application becomes the very thing we wanted to get away from...
Yeah. Azure is a second- or third-tier cloud provider; they are not in the same league as AWS.
AKS suffers, AFAICT, constant API server outages. We tried to escalate into a ticket, but we just get motte & bailey'd between "you're putting too much load on the API server" — okay, what load? how can I see that, control it? — "here is the top consumers" — they're all AKS itself? — "well, there's too much load on the API server" gah! (Yes, we pay for the "SLA".)
You can't add IPv6 anywhere in a vnet, it will break unrelated things. We tried to add a managed PSQL server on IPv4 (b/c IPv6 is not supported): it "failed" (the API call to create it timed out with an internal server error … after 2 hours or so!) because something unrelated in the vnet used IPv6.
ACR has a 20 TiB limit, no way to prune containers, we've had to work around IDK how many 500s, the API is slow as dirt (response bandwidths of ~50 kbps — bits — and their team does not think that's a problem. It can take 10 minutes to enumerate a few megabytes of metadata…) Undelete-able manifests that I guess we will just pay for indefinitely? I feel like I could build ACR on top of Azure Blobstorage and it would be more reliable with better performance…
VMs shipping with buggy kernels. (Support wanted to know what weird thing we were running to hit kernel bugs. "Docker"?) Global outages. Known outages often don't get mentioned on the status page intentionally. I still don't know what the difference between an availability set and a VMSS is.
Everything is in preview. Everything.
Audit logs fail to load some times. Beyond complicated interactions/differences between "service principals", "applications", "enterprise applications". 2FA app now requires authenticating twice, per login, because each tenant acts as a separate yet not separate login. So. much. auth. Role assignments that don't know what principal is being granted permission, b/c ARM & AAD don't do referential integrity.
Docs that are outdated. Requests for updated docs closed without update, because "we don't have the data <from some other internal team, I think, but so?>"?. Docs that describe API calls badly. "foo: the foo query parameter" No docs, at least, that I'm aware of, about what permissions does the API call require. The docs conflate "permission" and "role" (different in Azure) all. the. time. Azure doesn't know what permissions some calls require, and simply says "give it Contributor" (close to all permissions)… and yeah that works but I want to show auditors we're doing PoLP?
Support… The SLA often isn't met, we're assigned reps in China (n.b., this isn't a language barrier problem, it's, how is someone who is literally sleeping during my business hours because that's timezones for you supposed to even meet a support SLA that wants a 8/4/2 hour response? And AFAICT from response times, they're not a night shift…?), the first response is worthless (doesn't answer the inquiry, requests information present in the original request, often isn't technically proficient, etc.), the writing is broken and sloppy. Half the time a simple "reread what you're about to send. Does it solve their problem?"… we literally got a blank email back. We've had tickets where the first response we get is "we haven't heard back from you" because sometimes their responses fail to get linked into the ticket in the portal. They lack any formal bug reporting mechanism, and support is not equipped to handle bugs.
Azure is a typical Microsoft product: it doesn't work all that well, but you can get the job done if you just buy more of it (to work around the deficiencies).
From a business perspective, this is brilliant. From a technical perspective it's not awesome, obviously.
Pretty happy Azure & DevOps fan here. Been using it since day-1. I've helped build large platforms at several companies completely on Azure with great success. They were all small teams that shared all responsibilities across FrontEnd/BackEnd/DB/DevOps.
Our current project uses what I feel like are pretty standard features for a SaaS app. They include C#/.NET Core, Linux App Services, SQL Server, FrontDoor, SignalR, Functions, etc. Functions is the only feature that has been bumpy for us deployment-wise.
The journey of the various portals has been fun. The current one isn't perfect but it's far better than my experience using AWS. It's got to be a challenge to organize such a massive portal developed by so many teams.
The effort MS has put into documentation has been really great as well.
That said, there's a lot that could be better. A lot of the PMs are on Twitter I tend to be a squeaky wheel there about various problems so hopefully they're listening.
I have had the same experience - the entire package works really well for a small team. We had a really easy time configuring our builds, but we don't do anything fancy - install things, run unit tests, create a build artifact.
Haven't had issues with Azure functions, though DI is wonky in them if you try to build anything complex (imo you shouldn't). Other than that, I guess I just don't use them often to really understand what all the boilerplate does. Finally, when we switched to Python from .NET, it didn't feel like any of the function knowledge carried over somehow. Felt like developing on a different platfom.
I also HATED the Azure certification exam compared to GCP. GCP tried to actually teach you something practical, Azure tested a bunch of memorization that you can google.
There’s a nice bug with azure functions where it reports the environment it’s running in as “development” when it should be “production”, which means the app uses the wrong settings.
Nothing in the documentation to mention this, you just need to deploy and learn from your mistakes!
Fortunately the impact to us was quite small, but it's definitely left a sour taste in my mouth as far as azure functions go.
The whole story regarding organizing the code, breaking changes between versions, confusing plans, different styles of configuration between that and aspnet web apis, trying to configure individual functions in the same app without affecting others, etc. is not good!
One thing I do find is that although their documentation is now open-source, it can still take a long time to get changes reviewed and merged (if at all). I waited about 3 months once for a simple change to documentation that was clearly wrong and by the time they looked at it, the docs had been re-factored.
They need to learn to embrace the Amazon Turk and have the right people review documentation edits. Should be able to quickly check a proposed change and then just merge it.
Not directly Azure, but related: OneDrive Business (which is built atop Azure, I think)
I recently moved off Amazon Cloud storage to OneDrive because Amazon didn't support rclone. Microsoft's OneDrive is quite a bit less expensive than either Google or Dropbox or Amazon.
The storage is there but managing it - what a mess it is!
Each one of those portals take you to SOME view of your account with a gazillion settings. Many of them are repeated, and changing it on one portal doesn't necessarily reflect in others. For example, I enabled 2FA (I seriously don't know how or where - but I was able to login using the 2FA), but going to admin.microsoft.com showed 2FA disabled for the user - go figure!
Something as simple as figuring out how much space is currently used on your OneDrive is a challenge. There's a set ritualistic series of incantations and clicks that will get you there, but you really need to be persistent. Googling for it gives you an answer, but most answers will lead you to live.com which is only for personal accounts and not business accounts - it won't allow you to login.
office.com/launch/forms takes you to, and allows login to your business account, but going to forms.office.com (which is the top search result in google) redirects you to live.com and doesn't allow login to your business account!
One thing I want to say is that while the clouds are frustrating at the end of the day, in-house premise solutions sometimes share the same caveats (Little support for small users, gigantic mazes of solutions with little doc, etc.). I think once you reach certain level of complexity these issues pop up eventually.
One major difference perhaps is on the expenses, but are you sure you are hiring the right number of engineers and paying the right amount of $$ for those software and machines?
To provide some context: I figured eventually the configuration as code will reach the point that essentially people start to implement their own DSLs. When new people join they are easentially config boys/girls who needs to spend a lot of time to learn and unlearn a few DSLs created by the smartest guy/gal in the company who moved on to greemer pasteries...
It's a good article, check it out. It's really just about being a heavy Azure user and considering whether to recommend it to others, with the bulk of the article being a critique of various issues.
The mention of AWS ECS made me curious — does Azure have the equivalent of Fargate, where they manage the host and you only pay for the container's actual usage? The O&M wins on that have been really substantial and I recommend it for anyone who doesn't have sufficiently large scale that the savings will fund an ops team capable of running something like ECS or Kubernetes. One of the reasons why is the challenge of avoiding over-allocated instances — every time I've seen people running AWS ECS/EKS, Google GKE, etc. they've been substantially over-provisioned because someone's always meaning to get around to looking at that but the time never seems to materialize.
Their closest Fargate equivalent is probably Azure Container Instances. I haven't tried it recently but last time I did the startup time was very slow for new containers (5mins+). However that may have improved now.
Yes - and they had limits with EBS, too. It’s not perfect but it’s really handy for cutting out maintenance for a substantial fraction of my tasks which don’t hit the edges.
Yeah...and if I have to sit through one more week of managers and devs freaking out about lambda cold starts Im probably going to quit tech altogether. It is the most dumb ass broken record. Torture. Read the manual before you write two hundred functions and bet the whole project on fantasies next time.
My company once took a project on Azure, and ended up giving a 35% cut on our invoices to move to AWS. The bugs I filed while working on that project are still not closed.
I always wonder if going all-in on cloud services actually saves most people time, after hearing about all the weird corner cases that need debugging. Because leased dedicated hardware really doesn’t take much time to manage, it’s entirely fungible and easy to switch off of if you’re not happy with support, it can support an enormous amount of traffic, and it does it for relatively low cost (especially if you need eg high bandwidth).
> I always wonder if going all-in on cloud services actually saves most people time...
Agreed. The apps and services I manage aren't very big compared to a lot of posters here, and my company is certainly not going to ever pay the big bucks for real talent.
So instead, I have to design everything knowing that my ops team is going to be mostly $100K / year "devops" guys with a few "cloud" certs but no real CS, dev, or even Linux knowledge (yes it's that hard to hire good people now (at the low rates my employer wants...)).
I've gotten to the point where I absolutely mandate that they don't try to use Terraform or CloudFormation scripts, because in the end, there are so many edge cases or glitches that it's easier to just write an install guide that shows them which buttons to click in the AWS or Azure console. <sigh>
And when I look at the costs we spend per month - including all the unanticipated charges like NAT Gateways and $20 / day "managed" Postgres instances, I assume we'd be better off dumping the cloud and reverting to our 2008 setup: Spending $10K on some Dell servers in a managed data center and hiring an old-school Linux admin to manually install and manage it all.
It doesn't seem to. My current place is all in on AWS and we have about 6 "DevOps" people dedicated to dealing with it all.
Last place we did pretty similar stuff with 3 data centers, all self hosted, self managed, mostly OSS stuff with about 6 sysadmins and way less hassle.
I think Azure suffers in the same way as MS development tools generally, they are racing to try and make so many new things that their "legendary" backwards compatibility feels more like neglect than support. Sure, Webforms still works (which is nice, 1000s of companies have legacy apps) but they still haven't fixed the problem that nuget installs reformats web.config even if it doesn't change it, meaning you have to delete all the redirects, get VS to add them back in and only then do you see the actual changes.
I would see that as a "Major" bug because it is so unecessary and been there so long. MS's attitude? Bit too hard to fix, just use dotnet core instead, like we can all just do that with our legacy production apps!
VS is also pretty cool function-wise but still too many lockups, cache corruption causing strange compiler errors and files left locked after exiting but instead of doubling-down and refactoring the core code to work properly (I think that is a "thing"), instead they kind of start using VSCode instead even though it doesn't have half the functionality of VS but don't make VS become the thing of beauty it is supposed to be.
This used to be very different one or two decades ago, when Microsoft's backwards compatibility meant you could migrate your existing code to the new thing with minimal changes. This era is where the "legends" come from. Nowadays, it's exactly as you describe, unfortunately.
Agreed. Now, they'd make "Doors" (which doesn't run anything from Windows) instead of Windows 95, just leaving Windows 3.1 to rot as is. It's more likeways "sideways-incompatibility".
I find Azure to be expensive. Microsoft is not accountable for service issues and bugs. Support is just a wall of consultants that mostly cannot help and then reach back to actual Microsoft for help (which makes everything really really slow). Garbage.
I always dreaded talking to support - AKS would routinely break in weird and ways, and it was always super frustrating waiting for support to fix things that shouldn’t have happened.
We once had an error with Azure MySQL and AKS, where it would just stop dropping packets with basic instance types. Support could never fix it, we ended up just upgrading to standard because that worked.
I will say this - the manager of the team at Azure did comp our upgrade, because they couldn’t figure out why it was happening. I tried very hard to get our project off Azure, but because it was negotiated as part the Enterprise Agreement with Microsoft, it was “free” so the CTO wouldn’t budge. I assume this is how many people end up needing to deal with Azure.
Yes. We found a problem with Azure’s IPv6 support a while back.
It took quite literally months to get the issue escalated to someone who could do anything about it, and as far as I know it’s still broken, although it’s scheduled to be fixed this year. We had to pay for this “support” too. We worked around the problem in the end, but still.
It always depends on your scale. I was at a company that had a month spend in the 6 figures and they wanted to constantly talk. A different company I was at the spend was 3 figures a month, and we couldn't get any help at all, even knowing some internal folks in different areas of the company.
We don't get much spam from Azure reps, but APM/metrics/logs companies like DataDog really are trying hard to schedule free-of-charge "presentations" with our teams. Unless they are faking out MTA headers and mail client details, the mail templates are actually edited and sent from a desktop by an actual sales/advocate/devrel guy. Yuck.
> And if you ever need to actually talk to someone over there, it's a nightmare
I never understand it when I see people say this about Azure. Did you ever try creating a support ticket through the Azure portal? My experience has been nothing but stellar. They respond very fast, they provide very in-depth expert knowledge, and they seem to be quite thorough, escalating any issues if needed. This is my experience with creating tickets for various organizations, both big and small.
Suspicious as in they might have left Microsoft knowing about these issues, or that they're a plant by Microsoft?
It's quite likely they had insider knowledge and therefore had a better idea of where to look. But I'm not sure that makes it "suspicious", as opposed to... well, what else would you expect in those circumstances? Or are you saying that the unfair advantage of insider knowledge is what makes it suspicious?
At a previous job I was tasked with writing our bug bounty policy. I was also in charge of running and reviewing all of the SAST and DAST scans. It was tempting to monetize that potential conflict of interest, especially after I left the company.
I’d say my limited experience (in prod, with only like 50K a month bills) AWS and Azure are functionally identical.
Anything I need to do, that I haven’t done before (or recently) requires roughly the same spin up time to get familiar with how each provider functions anyway.
Don’t know if this is still the case, but I always felt more nickled and dimed on Azure (than AWS). For example, the base SKU for their managed PostgreSQL compatible service did not allow use in a VPC/behind a DMZ. You had to pay substantially more for the mid level SKU. This was not the case with AWS’s equivalent service.
I think what was happening was, if you wanted to make a VM image (which you could then later clone to new instances), it would result in the original VM (i.e., the source of the image) being no longer usable.
Are the security issues (that are fresh of mind!) just Microsoft being open and proactive about... security? Is AWS this open?
Everything else could have been written about AWS as well. They all feel duct taped when you know what's going on behind the curtain. Enterprise developers spend a crapload of time messing with infra on AWS. I think it all sucks. Unfortunately its better than nagging the BOFH for a few VMs (which you are only going to get this month if he knows you AND likes you, AND you can keep info-sec off everyone's back).
The Azure web console is light-years ahead of GCP's console, it's so much nicer to use.
GCP's console is just stupid though. It takes so long to load, and instead of just making it faster, they made it load in incrementally, so you just have to sit there waiting for whatever widget you actually want to use, load in correctly.
And even when it has loaded, the entire thing is extremely laggy.
I’ve been using Azure since the old portal and it always feels like Microsoft is just checking the box. AWS launches some innovative new service and Azure copies them with a half-as-capable alternative.
Azure is and will continue to creep into GitHub. Pretty sure my new projects will be on GitLab now just to get a headstart on using who the cutting edge service is a few years from now.
The biggest issue is that I've experienced some of that creep with GitHub Actions. It's a mess, doesn't do what it needs to do and besides "presenting" features/functionality it can't actually do even the basic things a nasty old Jenkins installation in an AWS ASG does.
They practically just copied some Azure CI stuff over (which was obviously written from a completely different perspective/mindset) using a tech stack that is old yet immature when compared to even relatively young products like GitLab CI.
If a company wants to come up with a CI product but can't even get on par with basic nominal GitLab CI features and usability, what's the point...
Azure pipelines is the most verbose and confusing CI tool I’ve had the misfortune of using.
Why does it require so many steps? Why are they so verbose? It feels like writing glorified bash scripts (in which case why am I using a CI tool?) but worse.
Judging from the stories posted here, can you even imagine what the stories from the poor souls working on this would be like, if only they could tell us without violating an NDA? Must be the stuff of nightmares.
I've never heard a single good thing about Azure, and now that I have to work
with it I understand why.
The most egregious thing (for now) was an Auto Scale failure. For an entire
business day the autoscaler failed to spin up VMs which effectively killed our
product for the day. We could not scale manually either. No logs, no errors,
absolutely no idea what happened. Support tried to convice us for an entire
month that it was our own fault for misconfiguring the autoscaler by quoting
the documentation that was saying another entirely different thing.
After insisting for a month the issue was finally escalated to someone who
could read the logs and immediately see what went wrong: they had run out of
VMs and could not scale up.
So we have :
1. Unreliable services, breach of SLA.
2. Zero ability to anticipate, prevent, debug, or ensure the problem will not happen again.
3. Incompetent support lacking basic reading abilities trying to gaslight me.
4. An obvious issue that should have raised alarms long before it arose.
Thanks Azure.
This week I spend half a day trying to understand why Azure would not create an
Application Gateway using a configuration that was identical to another one we
had already (do _not_ attempt Azure without Terraform).
Turns out it was another outage with absolutely no way to know it.
Best part is the official fix: "Please, retry in case of failure during
the deployment." It takes 30 minutes to create said resource before it enters a
failed state. XKCD 303 applies.
It's been about 10 years, but Azure lost one of my volumes with important data on it out of nowhere, and refused to offer me any support until I made a huge stink in the support forum.
When I finally got my data back, I moved to DigitalOcean and haven't looked back.
another team in my org is trying to configure the same but it's been challenge. I don't know specifics but it's taken considerable time to set-up the environment.
It's slowly starting to trend because of three realizations backend developers are having: (1.) Although scalability is hard, distributed apps are really hard, particularly distributed debugging and logging. (2.) Monolithic servers are now powerful enough to host an entire billion dollar internet business with millions of simultaneous users.[*] (3.) Cloud providers make their huge profits in part by carefully engineering their billing to maximize how much they dark-pattern surprise-bill their clients the moment they have unexpected growth.
[*] In early 2010s, WhatsApp could handle 2M connections on a single 1U with Erlang; MigratoryData could handle 10-12M connections with Java on Linux. Epyc servers are even more powerful now.
> Azure often has a feeling of being held together with a lot more sticky tape behind the scenes than I would want to know about.
This is my feeling. And I think it makes sense - vast swathes of how Google and Amazon work already rely on cloudy engineers (although they were there before cloud was a thing). Microsoft has much more split focus, with Windows, SQL Server, AD, Dynamics, Xbox, and Office all being large areas of engineering specialism that in no way feed into cloudy areas.
Bing and Xbox Live do, I suppose, but I bet they're built in a totally different way to both each other and, say, Teams, which I think is k8s in the back.
So my theory is there's much less of the company that does cloudy things, and even when they do (e.g. Bing, Xbox Live), there's probably a lack of a commonality of technical approach compared to what Azure exposes to customers.
It's not easier to be an AWS or GCE fan, though. I'd love to go cloud computing for my personal stuff instead of physical servers - but I don't want to go AWS because I don't want to get blacklisted from Amazon shopping or losing my Kindle library due to hitting a runaway AI flagging my cheap EC2 instance as "potentially fraudulent/hacked" or whatever, and for Google... well, losing that account would be way more disastrous.
Can we just have a decent cloud provider offering services on a reliability and scale level as AWS/GCE/Azure, but with decent customer support and no way for (necessary) anti-fraud/abuse mechanisms to kill off entire digital lives?
We moved EVERYTHING to Teams and ADO (for git, backlog, pipelines) except our actual product is still on AWS. If we didn't have a couple things like encryption, S3 and some fargate usage, we'd probably start fresh on Azure.
My latest favourite issue with Azure is Azure DevOps. It can randomly add extra quites to your bash script variables for w/e reason.
Azure Support is aware of the issue, it happens to many of their clients, but they have no idea why it is happening.
Hilarious. But still managememt was sold a vision of the #1 Cloud, so here we are writing more and more bandaids on issues Azure has.
Another one from the top of my head is Scheduled Events for AKS nodes. Azure provides you with a managed nodegroups solution, but has no project to support those events and you have to rely on not-supported community projects. Which often die simply because of lack of interest in Azure.
Azure Primitives (Compute, Networking, Storage) are a dumpster fire. Paas Services can only be as good as these primitives. Friends don't let friends use Azure. Just don't.
I've never used this az tool that is being complained about in all the years I've used Azure. The Azure powershell cmdlets are ever-changing and sometimes frustrating , but theyre certainly not bloated to the tune of a 1 GB download.
All of these arguments are presented weakly and self-admittedly compared to similar AWS failures in every instance. Maybe rename the article "Cloud Computing isn't a Magic Bullet".
It's tough being an AWS fan for that matter too. Aside from the simplest things, every time you need to do something there are huge reams of "documentation" in various stages of decay, and each doc is comprised of dozens of (unnecessarily) manual steps. Error messages are shit, too, and the whole thing has this stench of massive engineering debt and duct tape to it. I was told this stuff was supposed to be built by people who know what they're doing, but apparently not. It's duct tape all the way down it seems like.
Yeah, when I finally switched over to using it from Azure I was shocked at how horrid the docs are. I would have assumed by now they would have been cleaned up, but no, time does not heal these wounds.
The offending process costing me money didn't appear on the Azure console and I had no idea it was running, how to access it to stop it or even to know what was going on. When it turned up on my bill I nearly had a heart attack.
Thankfully they let me out of it once I pointed out I was getting billed for something I couldn't see. I appreciated that greatly but I've never gone back to Azure and the experience scarred me so much I don't think I ever will. This was about 4 years ago so more than likely they have sorted it out.
They keep reminding me I have $50 of test/dev credit on Azure through my Visual Studio subscription but it flat out frightens me to even try to use it.
AWS isn't perfect but at least you have half a chance of working out what is costing you money through the billing console.