Hacker News new | past | comments | ask | show | jobs | submit login
It's tough being an Azure fan (alexhudson.com)
357 points by ealexhudson on Sept 23, 2021 | hide | past | favorite | 269 comments



I once ran up a $15,000 bill on Azure completely by accident when trying to get one of their video processing services to work. Once I figured out the service wasn't going to do what I wanted at a price I could afford, I tried to detach it and shut it down and thought I had succeeded. I didn't.

The offending process costing me money didn't appear on the Azure console and I had no idea it was running, how to access it to stop it or even to know what was going on. When it turned up on my bill I nearly had a heart attack.

Thankfully they let me out of it once I pointed out I was getting billed for something I couldn't see. I appreciated that greatly but I've never gone back to Azure and the experience scarred me so much I don't think I ever will. This was about 4 years ago so more than likely they have sorted it out.

They keep reminding me I have $50 of test/dev credit on Azure through my Visual Studio subscription but it flat out frightens me to even try to use it.

AWS isn't perfect but at least you have half a chance of working out what is costing you money through the billing console.


> AWS isn't perfect but at least you have half a chance of working out what is costing you money through the billing console.

The first few times I used AWS for tutorials, something similar happened to me. I thought I shut everything down, but kept getting billed and wasn't able to find it without contacting them. It was just a few dollars, but I've been wary about any services where you can't cap the billing.

Cloud platforms generally don't let users cap the billing, because those overages are good income for them. I prefer using services like DigitalOcean or Linode where you can be sure that your new site crashes for 15 minutes instead of bankrupting you.


Oh me too. That horrible "mythical beasts" tutorial. I thought I removed it, and then each month for the next two months I'd get nailed for 50? dollars or so.

By definition, I don't understand AWS, so figuring out how to turn it off was nearly impossible. AWS "support" didn't exist. Stackoverflow AWS geeks were in high dudgeon that I'd ask the question, "how do i disable this?" and would kill my question. Finally some kind soul did give me the trick to finding the last service to disable.

BTW the tutorial is absolutely useless. Just a thousand different incantations to repeat. No real understanding communicated. Felt more like an ad for myriad services.

0/10 would not recommend.


There are consultancies whose entire expertise and premise are advising on AWS billing to companies who already know AWS, so the idea that a newbie should be aware of whatever magical incantation to limit their spend is ludicrous.


That’s not entirely fair. While I’m not going to defend the complexity of billing in enterprise cloud services it’s also not really that hard to set an billing alert in CloudWatch and track spend in Cost Explorer. Sure it requires a little bit of AWS knowledge but you shouldn’t really be using enterprise services like AWS (over other more accessible services like Digital Oceans) if you’re not willing to spend the time learning it; and at an individual level it’s very manageable.

The reason those consultancy firms exist is because billing scales terribly. Once you’re a business using AWS you’d likely have a multitude of projects running across a multitude of departments which need to be billed to a multitude of different customers and internet cost centres. This all needs to be processed by an internal 3rd party financial system managed by non-technical people who wont even know what AWS stands for let alone what it does and how it works. In those situations the problem of billing becomes exponentially more difficult than a one person hobby project.


> it’s also not really that hard to set an billing alert in CloudWatch and track spend in Cost Explorer

Billing alerts aren't good enough.

Consider a scenario where you're consulting on someone else's small- or medium-sized project and your bug costs the client a huge amount of money in the middle of the night. Now who pays? Say goodbye to your paycheck or reputation, even though it should have been preventable.

Another scenario: you launch a startup, and a bug empties the bank account and kills the company. If the solution is to just not use things like AWS and GCP (including Firebase, which has no billing cap) when you're getting started, why are they advertised that way?


> Consider a scenario where you're consulting on someone else's small- or medium-sized project and your bug costs the client a huge amount of money in the middle of the night

You can also set alarms that warn you of projected usage.

> Now who pays? Say goodbye to your paycheck or reputation, even though it should have been preventable.

If it’s legitimate usage then I’m not really sure what you’re advocating; are you implying a service being suspended in the middle of the night because a hard spend limit has been hit is somehow better for your reputation?

Or maybe you’re suggesting that it’s not legitimate costs, in which case you’ve set AWS wrong to begin with and thus your reputation probably deserves to be queried.

> Another scenario: you launch a startup, and a bug empties the bank account and kills the company. If the solution is to just not use things like AWS and GCP (including Firebase, which has no billing cap) when you're getting started, why are they advertised that way?

No cloud service operates that way. In that situation you’ll almost always get charges refunded. Even in instances of gross negligence (which would be the case here since for the bank account to be emptied it means you’ve not been watching your spend for more than a month and no business should operate that way)

I do get the points you’re trying to make but I’ve been working with the cloud for some time and have seen plenty of horror stories, all of which were due to gross negligence and most of which were still refunded by AWS as a gesture of good will. They’d much rather have your repeat business than burn their users with bills that cannot be paid.


I think the issue with billing is the same issue you get with OOMs - which services should go down first if we’re out of money? In practice “OOM” (Out Of Money) is even worse than OOM because with Out Of Memory you are just above the available memory threshold, but with Out Of Money you are literally at 0 capacity which means every single service needs to be killed.


The customer should be able to decide.

I’ve built solutions for customers who would prefer an unplanned outage over an unplanned $250k expenditure. Many customers think that I’m a dinosaur for saying it, but if there’s a 2-5 year expected lifetime for something, it’s almost always more cost efficient to use traditional colo or VPS.

Also, operationally it’s possible to have something more than all or nothing. Big companies usually have “Tier-0” services that must be up all costs. AWS is no stranger to complexity - this type of function doesn’t exist because it would cost them money. They probably make 9-figure money from obviously idle services.


If you want your spend to be capped then there’s nothing stopping you from setting a CloudWatch alarm at a budgeted threshold that then scales down your infra.

AWS is like Lego. You’re supposed to build on it to create the behaviour you want


Hum, I don’t know think you understand what I am saying:

If you are out of money, all of your services needs to be destroyed immediately: including all of your database hard disk drives, because every single piece of infrastructure yields some regular cost.

It means that it is NOT going to be “a simple unplanned outage”, it’s going to be more akin to formatting your hard drive with all of your family photos on it.

Pretty certain AWS just finds it easier to sometimes write off costs rather than implement something so radical that customers may be even less happy afterwards.


> and at an individual level it’s very manageable

And yet, we have horror stories of students and even experts being hit by surprise AWS bills.

Although I agree about usage - I never go anywhere near AWS unless someone else is paying for it.


> > and at an individual level it’s very manageable

> And yet, we have horror stories of students and even experts being hit by surprise AWS bills.

They obviously didn’t bother to manage it. But that doesn’t mean it’s not manageable. It takes all of 5 minutes to set up a budget alarm on CloudWatch. It was one of the first things I looked into doing when I set up my own AWS account years ago specifically because I didn’t know what I was doing back then and thus didn’t want any surprises. If I managed it then, then I find it hard to believe others cannot too.


I spun up an AWS service a year ago (while taking an AWS course I never finished) and haven't been able to find it or turn it off since.

That's $9/month until I disable that credit card, which I might do one of these days.

I thought they'd notice and disable my account after I successfully disputed the charge one time, but the bills just keep rolling in each month.

This might be a bit overreactive, but... I'm building an MVP for a SaaS app and I sure as heck am not going to host it on AWS.


You’re getting an invoice emailed surely? You can sign in with that email and close the account down.

Happy to help you if you need any assistance

Disclaimer: I don’t work for Amazon but I do work heavily in AWS.


AWS "support" didn't exist.

It's been 5 years since I was responsible for anything AWS, but back then at least support was surprisingly good even when spending less than $100 a month. Emails would get answered and I could often get someone on the phone within 24 hours if I needed.


I think it depends heavily on what area you want support for. EC 2 support is generally very good, billing support is reasonable if a bit slow. Support for a lot of their managed services, or media services is beyond useless. All the support engineer does is take your complaint and says they will check with the internal service team. Your ticket will just end up in "waiting for Amazon" state for weeks or months.


Indeed. The best outcome of going to the frequent AWS events is meeting people from AWS who you can actually contact in these kind of situations.


> BTW the tutorial is absolutely useless. Just a thousand different incantations to repeat. No real understanding communicated.

I share this sentiment after going through a lot of tech tutorials and onboardings. And it's not limited just to AWS. I find myself forgetting most of the information I was supposed to learn. I'm trying to be extra mindful and offer additional explanations when I'm writing procedural guides myself, but I still have a lot to improve.


I understand what you feel completely. I actually wasn't even able to complete the tutorial since the commands given would trigger a permission error one way or another. I got charged $50 as well since I wasn't aware that there's still services running. Such a terrible thing to front as a "Getting Started" guide.


Disclosure that I'm the Co-Founder and CEO of Vantage - and used to work at both AWS and DigitalOcean....but this is a large reason we offer a fairly generous free tier on https://vantage.sh/ to help folks figure out where their costs are coming from to take action.

It can be really frustrating and annoying when you can't hunt down where costs come from.


I'm nitpicking on unimportant details here but this seems like more of a general UI/UX problem than a free tier problem.

AWS and Azure both have pretty straightforward free tiers, but people end up accidentally racking up large bills anyway. A UI to see a list of running services, sorted by cost either doesn't exist or is not prominent enough.


In Azure: Sidebar -> Cost Management + Billing -> Cost Management -> Cost analysis (preview).

It's slow, you can't open links in new tabs, one wrong click and you have navigate to the right page and wait for everything to refresh again, but you can get the list of billed items. Plus a "Other subscription charges" that contains some mystery costs not linked to anything.


‘ contains some mystery costs not linked to anything’

Hard to believe that really is the case for a product targeting serious customers in 2021. Doesn’t that make a surprise bill still possible?


The state of cloud billing seems like exactly the result you'd get if you didn't have a strong mandate that product teams implement a unified, centralized billing interface.

And it makes sense from a growth perspective: new products grow revenue, bad billing systems only annoy customers (but mostly invisibly).


vantage looks like a cost analyzer. vantage free tier will help you find non-free tier issues on AWS, if i understand the GP's point.


I mean't the Vantage free tier - not the AWS/Azure free tiers.

The UI to see a list of running services, sorted by cost is precisely what Vantage offers (and more)


It's kinda telling that you need an external service, to figure out where your money goes..


We have services across a lot of cloud providers, Google and Azure for a variety off things, Digital Ocean for some and the main stuff on AWS. I'd love it if a service such as Vantage could track them all, I can only find solutions for AWS.


I feel like a lot of cloud platform tutorials out there should start with having you create a billing alarm that emails you when the bill crosses a reasonable but higher than expected amount. For basic tutorials, that might be something like $20.


I would feel even better if there was a Shutdown-Everything at $20 threshold.


This is a great idea!


My rule of thumb is to avoid the cloud entirely unless my employer is paying for it. Otherwise, like you said, I'll stick with a fixed amount a month.


One of the first things you do when learning AWS via A Cloud Guru is setup a billing alarm.

I have one that goes off monthly at about how much I expect to spend in a month.


> Cloud platforms generally don't let users cap the billing, because those overages are good income for them.

No. They don't have caps because it's really hard to implement and because the negative press of an app going down because the cloud didn't scale is far worse than any received by surprise bills. Scaling is a large chunk of what you're paying for, after all.

This is quite obvious when you look at how lenient AWS is with retracting surprise bills. And the money perspective doesn't make sense, either: AWS is living on customers that have bills in the 5 figure range and up. The occasional 10$ from someone playing around aren't even a drop in the bucket.


Caps are not hard to implement. That’s a bullshit excuse. If they can bill it they can stop it.


You sure? Say your app is reaching the cap. What do you do with ongoing costs? Do you block writes? Shut down VMs? Delete stored data?

So you might say, simply project the cost (with some magic) and prevent that from going over the limit. So, imagine, your app suddenly experiences a load peak and you need to scale up. However, adding a VM would increase that projection too much. Do you not scale, despite this possibily being a small load peak, and let the app go down? Or do you risk the situation described above? Doesn't matter, you'll get bad press either way.

And beyond that, you'd still need to project costs like traffic volume, which can vary extremely. Not to say anything about the technical difficulties of coordinating that billing information across hundreds of services in real time.

And even if you do all that, you still get bad press of the likes of "we forgot to remove our payment limit and it killed our app while being on the front page (and our alert did not trigger because we couldn't afford another mail)".

There's no way AWS (or any other cloud) is eating all these drawbacks just to have a limit. I bet it's orders of magnitude cheaper to just eat the occasional surprise bill.


I get $150 azure credit a month - my subscription has no credit card attached, when I reach the limit everything shuts down.

Azure have the ability to stop everything at a specific limit - they choose not to make it available,


Does it shut down at $150.00 or does it shut down “some amount of time and unknown dollars after you cross $150”?

I’m willing to bet it’s the latter. If in a normal account, you’d then incurred $152.78 or $166.39, did the limit work? Would customers agree?

My cloud bills continue to change for several days past the end of the month (for legitimate calculations that come in for usage incurred during the month).


Who cares about the exact amount?

We couldn't technically make it stop exactly at $150.00, but only at $167.89 or whatever, so we are letting it run to $15k.

For catastrophic cases it doesn't matter. If it saves a person from an unexpected $15k bill then it works. Even for many businesses it would be ok to drop everything - I know some which can withstand being offline for a day, but not a $250k bill.


Make the MVP opt-in, delay any irreversible stuff by a few months (ex. deleting s3) with a deposit to cover costs, figure out the rest from user feedback? Aka do it like any other new feature is developed in a modern shop


> You sure? Say your app is reaching the cap. What do you do with ongoing costs? Do you block writes? Shut down VMs? Delete stored data?

Two approaches:

  A) Hard limits: freeze the services immediately if your cap is reached, ideally by giving a heads up some time beforehand with predictions, if possible; this is what many VPS providers out there do for unpaid bills and such, which makes sense
  B) Courtesy: allow the services to keep working, but at a degraded performance level - that's what some of the other VPS providers out there do; for example, decrease disk performance, cap the CPU performance, limit the network speeds etc.; probably eventually also block writes, but don't delete data outright; any of the aforementioned should trigger monitoring alerts on the developers' side and Zabbix or another solution would alert them in minutes, as well as the vendor should also send e-mails about these measures either currently being put into place, or about to be put into place, so that the necessary actions can be taken
> So you might say, simply project the cost (with some magic) and prevent that from going over the limit. So, imagine, your app suddenly experiences a load peak and you need to scale up. However, adding a VM would increase that projection too much. Do you not scale, despite this possibility being a small load peak, and let the app go down? Or do you risk the situation described above?

There's a difference between having the current capacity with a degraded performance during the spike and killing the entire app. You don't always need to scale up, depending on your failure modes. Having consistent service response times is overrated, as is needing to serve every single request without ever telling a small portion of your users that your service is experiencing high load - there should be solutions in place to deal with the backpressure and prevent data loss even under these circumstances anyways.

Unless you work in a Governmental organization or another critical piece of software for society, degraded performance is probably okay and no one feasibly cares or remembers even small outages - regardless of whether it's large sites, or small non profits or even side projects. Whereas if you do, then you probably have enough money to throw around for billing caps to not be relevant.

If you subscribe to those beliefs about always needing to be up and serve requests, however, then there's another option:

  C) Billing alerts: something that most of the providers out there already provide in some capacity, however in fairly bad ways; if AWS can bill you for Lambda functions on a 1ms basis, then there's no excuse for not receiving billing alerts the very instant when this spike first happens: https://aws.amazon.com/about-aws/whats-new/2020/12/aws-lambda-changes-duration-billing-granularity-from-100ms-to-1ms/
Better yet, allow your clients to choose which of those mechanisms they desire to use, in the order of the potentially least expensive (infrastructure wise) to the most: A, B or C. That way the little guys for whom a 10k bill would be life ruining could just use A, whereas startups could stick with B and huge corporations who have a large runway of cash to burn could use C.

> Doesn't matter, you'll get bad press either way.

Bad press? As opposed to what, going broke and not being able to pay your rent because of unpredictably large bills with no way to limit them, just because your side project got popular on Reddit or Hacker News?

There's a world of difference between what's needed by corporations and what's feasible for private individuals, so for as long as there's a chance of such bills, i will not use Azure, AWS, GCP or any other platform like that.

Remember: these surprise bills will only be "eaten" by the larger providers based on their own goodwill. There's not much preventing them from banning you outright.

There are other providers that are far more reasonable in that regard for my needs: https://news.ycombinator.com/item?id=28639196


Wow, it's really interesting to read how clueless folks are here on HN.

The reality is kortilla is the same person that if AWS deletes all their data when they hit their "cap" to stop the billing will be on here complaining.

Payment method not go through? Hit your cap? To stop charges AWS needs to delete forever almost everything in your account. All your S3 data gone. All your backups / databases and archives gone.

Oh - you actually DON'T want them to blow up your platform? Maybe they could provide instead a billing console

https://console.aws.amazon.com/billing/home

or maybe alerts and alarms?

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitori...

Cost or usage budgets (and many more)?

AWS Budgets (includes alarming options).

Some folks seem to have almost no clue about what AWS customers who pay the billions want. Is there ANY chance that AWS listens to its paying customers? Maybe has become successful by doing so (at the cost of total feature sprawl in my view?).

My quick trick is to login a day or two after I think things are shut down and look at projected bill and current month billing. I've left some very large instances running a time or two, easy to turn off.

And with billions of requests (PER SECOND) on the aws network, there is NO WAY they are doing real time billing. That is not happening. Look for daily aggregation and similar. Just the scale of permissioning on API calls must be insane per day. These are going to need to be doing local counters that aggregate periodically.

I do wish they'd maybe aggregate 4x per day (6 hours).


They should go talk to Azure then, because Azure for Students gives you $100 per year. The second you hit that $100, it kills everything.

I’m not saying it would be simple, but it wouldn’t be a bad idea to have a global monthly billing maximum where it’ll nuke the account at that number. There’s lots of new developers that are probably too scared of the free tier to use it without something like that (I have AWS experience, and I don’t use it for personal stuff specifically because that doesn’t exist. Which means I probably won’t tell my employer to use it either).


I do like this idea, but for their bigger customers a cap where billing (and services) all stop is not what customers are asking for. Instead they are asking for durability / resilience / object locks etc.

I'm serious, what large businesses wants to lose EVERYTHING (all static IP's, all glacier and S3 data, all database and compute) over a billing issue.


You have to see how that's both totally different and a terrible idea for a business, right? If anything the existence of hard limited student accounts combined with the fact that support always* refunds giant surprise bills underscores the point that caps for business accounts don't exist because of the damage they could do.


> and a terrible idea for a business

Surely it's the customer's role to decide whether or not it is a bad idea?

How 'bout you run a non-profit and have an allocation to run the services, would you prefer your nonprofit to lose the website for a couple days, or for it to go bankrupt?

What if you run a company for which online presence is a means of advertising and not the revenue-generating platform, would you prefer to run your advertising campaigns on a budget or without limits?

And so on.


This seems like a strawman. You could almost definitely ask/warn the user when they set the cap, or you could make the cap not apply to anything that is not trivially recoverable.


They have this with their cost explorers, alarms, budgets and more.

Realize that most customers are more focused on will their data be preserved.

Blowing out your entire EC2 / RDS / S3 / Glacier backup stack over a billing issue (or someone setting a cap up in the accounting department) makes no sense.

Are major customers really asking for this? Why risk it, why even build a tool that can blow out a customers setup so completely.

This is why I don't understand HN sometime saying AWS is "BS" etc. Does HN not thing AWS talks to their big customers to find out what they want?


Than the billing will continue


> the negative press of an app going down because the cloud didn't scale is far worse than any received by surprise bills.

Certainly it is the customer's decision to make, and not AWS' ?


>because the negative press of an app going down because the cloud didn't scale is far worse than any received by surprise bills

is it?

if you gave user an ability to: scale infinitely or scale up to the $$ point, then how would anyone could be reasonably mad?


Exact same thing happened to me too


I'm convinced folks who complain about "unexpected recurring small bills on AWS" have never taken the five minutes it takes to learn to using the billing console.

It's like asking why the knife you're using keeps cutting you when you put your finger on the sharp end. Learn to use your tools, and they won't surprise you. Learn to track your costs, and you won't have unexplainable recurring bills.


It's like asking why the knife you're using keeps cutting you when you put your finger on the sharp end.

Most cloud providers don't have this billing problem and so the analogy breaks. Its more like, why does this knife keep shooting me in the foot?


> five minutes it takes to learn to using the billing console.

I spent hour or two trying to find anything running and didn't find a damn thing. Yet Amazon decided to charge me a buck or two every month for "storage" so I just canceled whole account. I mean, if I can't find what I am paying for when not using it, how am I supposed to understand the bill when I actually have a dozen of instances?


That's a good comparison. Have you ever cut yourself with a knife, any time in your life? I certainly have. I've also had a £150 AWS charge I had to pay.

In both cases I learnt my lesson. I'm careful not to put my fingers where a knife can cut it, and to not put my bank details where AWS can charge them.


Were in a room with 150 engineers, techies, and problem solvers who have been asked whether a cutting implement is overly complex or dangerous.

100 of them have provided an opinion and 99 of the opinions are that it is. Everyone in the room is bleeding.

I think it's fair to say that there's an issue with the tool, Stockholm syndrome not withstanding.


Yup I also do find those arguments ridiculous. „I cut myself because I dont know how to use knifes, but its the knife fault fpr cutting me”.

You can even enable scans for unused respurces that generate costs..


I know Google burns through user trust like it’s incense candles, and so people fear of losing their account access or features being deprecated without notice, but, I really think GCP is underrated. In GCP, it’s easy and best practice to group things into granular projects, and if you want to, you can shut down an entire project all at once. The billing story isn’t perfect, but it’s not bad either, and it’s not the only thing going for it.

I know I’m inviting replies airing grievances with GCP, and there certainly are many, but I’ve come to really like what it offers. Especially, GKE is really cool and at least to me feels very whole-assed as far as Kubernetes offerings go. Obviously that would make sense, but still, it really is nice to use.


It's the same in Azure, really. It forces you to setup a resource group for everything (can't have any resources outside a resource group). If you're trying something out - just put them all in same resource group, and at the end delete the resource group.


All the Azure tutorials are structured this way, and they frequently end by having you clean up your rg.


That billing model makes it very easy to find out what the operational cost of a particular product/service/team is costing the business and I think it's one of the advantages that GCP has, personally.


It took me 2 years to get AWS to stop billing me a few cents a month. I didn't try that hard but every month or so I'd get another bill for like 23 cents. I'd login and try to figure out what was still turned on.


AWS has been billing me a few cents per month for over a decade now. Eventually, the credit card I was using expired, and so every month for the past 7 years, they've sent me an email:

> Dear Amazon Web Services Customer,

> Your AWS Account is about to be suspended. There is still an outstanding payment problem with your account, as our previous communications did not lead to successful payment of your past due AWS charges.

> We were unable to charge your credit card for the amount of $0.26 for your use of AWS services during the month of Aug-2021. We will attempt to collect this amount again. Unless we are successful in collecting the balance of $0.26 in full by 09/30/2021, your Amazon Web Services account may be suspended or terminated.

The balance never increases, even though it always says the charges are from the previous month. It's as though they forget the balance each month because it's not worth collecting, then my account runs up another small charge, which they promptly forget.

And of course they have yet to terminate the account. I would gladly close it myself, but I'm locked out, and it's not really worth my time to figure it out how to get back in (plus they keep telling me they're going to terminate it, which is exactly what I want them to do!).


I got this too, and they did eventually terminating my account. I missed all the emails tho, but when I went to use AWS for some project I found out I was terminated and the only way for me to use AWS again was to create a new account (I emailed them, they can't un terminate me)


Same thing happened here — for $23 instead of .23¢.

What a dysfunctional service.


I found the solution to that problem. I just actually use AWS for some stuff now, and my bill turned into a nice $20 ish dollars :P


Have had AWS tell us to chargeback a payment that we couldn't cancel.


i have this happening on gcp and aws. i get a text from my credit card each month for like 10-25 cents. i’ve gone in multiple times all the consoles show $0 it’s impossible to track down. it’s so cheap i haven’t cared enough to dig more


https://console.aws.amazon.com/billing/home?#/bills?year=202...

This page should show you a full breakdown of exactly what’s costing you money across all regions.


Yep. I have some one cent a month bill from GCP from Hackathon days.


There is a good chance all of you have a reserved IP address floating around somewhere in the settings. It costs like .05 a month


Both services have billing breakdowns by sku. Are you saying it shows nothing?


I never use AWS or Azure privately. Everything I test I do on company or customer credit. You also more or less have to believe what they say you used in CPU time, storage is a bit easier to check.

But even with a simple webservice you quickly get to 50 - 100$ alone from testing and deployment.

I heavily recommend to rent a server for private use. They are always the cheaper option at fixed costs. Of course you don't have those fancy services... I only use a virtual server right now and pay $45 quarterly with domain. It still does have quite a lot of power though.


I'm a huge fan of Oracle's (yeah, I know right) free tier for especially this -- you can't accidentally use paid stuff without manually toggling an upgrade. It's such a relief to know that you can mess around with things and not be charged for it.


That sounds really good, last thing I’d expect from Oracle - are you sure you don’t need to pay for an add-on for that :)


In Azure I put all my stuff in a new resource group and then when I'm done I just delete the entire resource group. This has worked well for me so far and I haven't had any surprise charges like I did on AWS and Digital Ocean.


Thought Digital Ocean has fixed charge.


I wanted to clear out my account so I deleted the VM but it didn't delete the associated static IP, so I got charged for the unused IP address that month. I didn't know the IP was still around until I got the bill. If this were in Azure I would have deleted the entire resource group and the IP would have gone along with it.


Oh no, I never forget the HN story on the expert losing 80K on AWS.



Just one thing: credit card. Once they have all details, they can do whatever they want with you. They can let you go - but that's only their good will.

Personally I'm in love with Hetzner cloud. It has less "features" (=proprietary complexity) than AWS/Azure, but I can understand everything perfectly and, most importantly, I have a guarantee I won't be charged over a limit.


I had a $7k bill from AWS due to a bug in a SaaS product. They wouldn’t refund it. I wouldn’t use them again after that.

I then tried to close down another account based on AWS organisations, and it was super complicated to get it all removed to stop billing to the extent I would have needed to hire an experienced consultant to do it rather than just click “close account.”


A few years ago I felt just a blind about billing but the latest stuff seem pretty well tracked and reported in the portal.


I'm glad they have made progress in that area. As others have pointed out it can sometimes be tough to track down sources of costs in AWS as well (especially if you accidentally started something in a zone you don't normally play in). I'm still gunshy though.

AWS have a setting in their billing that can fire off if you have exceeded a certain threshold which would be pretty much invaluable if you're running something that auto-scales.


>They keep reminding me I have $50 of test/dev credit on Azure through my Visual Studio subscription but it flat out frightens me to even try to use it.

The VS credits are hard capped. At least they were on my account (didn't even have a credit card loaded)


You hear about all of these horror stories about services from IaaS/PaaS/SaaS providers essentially being black holes for money with no way to actually set hard limits for billing, even after all of these years and can't help but to think about more reasonable alternatives.

Why not just use something like DigitalOcean (https://www.digitalocean.com/products/droplets/), Vultr (https://www.vultr.com/products/cloud-compute/), Hetzner (https://www.hetzner.com/cloud), Contabo (https://contabo.com/en/) or even smaller regional ones: personally i use Time4VPS in Lithuania because it's close to me (https://www.time4vps.com/?affid=5294, affiliate link to make my hosting cheaper if someone else registers)?

Providers that just give you VPSes that you can run whatever containers on (or just host things the old fashioned way) and basically use whatever open source software that your project needs. More importantly, providers that give you predictable billing at a flat figure per month (or less, if your VPS isn't active all of the month) and simply slow down your network connection if the set limits are exceeded.

If you're just doing something to practice and haven't sold out to SaaSS (https://www.gnu.org/philosophy/who-does-that-server-really-s...), you probably really don't need to worry about scaling just yet - you can migrate over to AWS, GCP, Azure, or any other of the large providers at any time, by just running your containers on their scalable infrastructure, if there's even any point in doing that, since the smaller providers also can scale similarly in most cases.

Lastly, it just feels dangerous to give AWS, GCP, Azure or any other entity that's known to give people insane bills your personal details and personal credit/debit card details - what if they decide to block you because you can't pay and your complaint doesn't become popular on Reddit or HackerNews? It almost feels like setting up a shell company and using limited virtual credit cards would be more reasonable, same as people always say that you should have a separate Google account for your personal needs and anything in any professional capacity, so nothing gets blanket banned.

Sadly, there aren't many tutorials that start with: "Here's how you set up a company that's detached from your personal details, and here's how to easily make one credit card per vendor." Honestly, even the internal workings of companies like https://privacy.com/ are unclear to me.


For running personal projects that don’t need to scale instantly to demand you are 100% right that you should not go with cloud providers. Hell, most startups shouldn’t either.


> "AWS isn't perfect but at least you have half a chance of working out what is costing you money through the billing console."

Same thing happened to me on AWS. I was trying to get a Windows VM in the cloud to run some CAD software. Ended up blowing through $500 credit in a week and shut it down before it went through the credit.

Unexpected bills like that makes for a terrible experience to be honest...


I've had a $1.5/month charge on AWS that I tried repeatedly to track down but failed each time. Eventually I just let it run until I canceled my credit card.


Similar happened to me, for less money though clicking around in visual studio.

I thought I'd deleted the resources but months later I was getting emails about past due charges


I have a funny story about Azure. Our company was looking for a cloud provider, and Microsoft sent a couple of salespeople to talk to us about theirs. Their salespeople were very condescending, talking like they'd be doing us a favor by allowing us to become customers and failing to take any of our questions and concerns seriously.

I think it's because we aren't a large corporation in terms of headcount.

In any case, that meeting ensured that no further consideration of Azure would take place, and it's very unlikely that it would be considered in the future.


When you have an existing stack, the choice is often between adding Cloud or staying on managed/on premise, and not between Azure and AWS/GCP.

As in the article, if you already run Windows servers at scale, going to AWS is possible but won't be your first choice. Microsoft sales people being condescending will basically reflect that situation.

In most other situation I can think of you'll go to Azure only if you can't use the alternatives, so again you're basically at their mercy.

To digress, if you run linux servers with an open source stack, going to Azure will bring you virtually nothing, and will probably be an expensive PITA at every step. In a previous ruby shop we had more empathic dev evangelists walk us through Azure, but we felt like wasting their time as it clearly wasn't a priority to them. Then looking at the price we'd need to drop them as soon the discount prices expire, so it was just a losing proposition for everyone involved. I totally understand how more sales focused employees would weed out shops like us and go look for those who have to stick with them anyway.


> As in the article, if you already run Windows servers at scale

My company does that. We have started moving new things to the cloud, but we took one look at the Windows pricing in the cloud and ran away. And my company is DEEP in Windows.

So I’m not sure Azure really has much of a leg up there. I think the only real benefit they have is their hybrid cloud for people who do use Windows in the cloud as well.


We (different organization) had a similar conversation with GCP, with similar results.


Looking for a k8s for a hybrid deployment, the contrast was blatant: mention it to a Red Hat sales person and you have people showing up the next week to pitch OpenShift. AWS has a deep story here, same deal. A friend stopped begging the Google rep to pitch Anthos because it was kind of obvious how it'd go if you actually needed support.


This has been my experience. I've dealt with all of the big three cloud providers, and as a big customer. Azure is incompetence, AWS is largely capable and professional (so long as you're big enough to register on their radar), and GCP is just dripping arrogance. GCP is a better platform than Azure, but I would never choose it because dealing with Google reps is a never ending stream of condescension.


If you can get a dedicated account person (1000 person company), AWS is probably the clear winner.


Any good options?

This thread is nightmarish for someone looking to deploy a SaaS with three 9s


I, personally, would still pick AWS.


Dodged a bullet there, I'd say. You can safely assume that google will EOL anything in GCP that you might actually start relying on for business function.

"Dear Google Cloud: Your Deprecation Policy is Killing You": https://steve-yegge.medium.com/dear-google-cloud-your-deprec...


What service has GCP deprecated?


I don't fully agree with parent, and GCP is differently managed compared to other Google products.

That said, the k8s version deprecation policies, mixed with significant changes in cluster setup from times to times you have to keep up with is peculiar. I kind of think this is something we accept when going into k8s, but I'd understand people not at ease with that philosophy.


Many salespeople repeatedly visited my previous company, a megacorp. While attendees from our side rarely had any authority to purchase (e.g. engineers not managers), the visitors were extremely enthusiastic in their presentations and demos. I personally felt bad that their efforts wouldn't lead to sales, but they helped boost our self-esteem for the day.


Efficient that you got it sorted out in one go. Well done!


A moral hazard with all cloud providers is that their PaaS services are typically billed on consumption.

So what incentive do you image they have to make those services efficient?

My favourite example is Log Analytics. It can easily cost up to 20% of the virtual machines it is monitoring! If you have a very heavily loaded website and you're logging every HTTP request, it can exceed the cost of the service it is monitoring.

They charge you a ludicrous $2,380 per terabyte of ingested data. This is 20x the cost of the underlying storage, even if it's Premium SSD! For comparison, AWS charges just $500, which is still overpriced.

Now consider: If you're an Azure software developer and you find a way to reduce the bloat in the log data stream format, what do you think the chances are of getting that approved with management?

They have a firehose spewing money in their cloud. I can't imagine them ever saying: "I think it's a good idea to turn that down to a mere trickle!"

As others have pointed out, all of their other services have similar moral hazards: Bastion, NAT Gateways, Private Endpoints, Backup, Snapshots, etc...


Even though it takes some ops-skill to setup, this

> They charge you a ludicrous $2,380 per terabyte of ingested data. This is 20x the cost of the underlying storage, even if it's Premium SSD! For comparison, AWS charges just $500, which is still overpriced.

just makes me happy about our monitoring cluster on the german hoster Hetzner. The old systems are at 40 Euros / month for 900GB storage and the upgraded ones are 40 Euros for 1.3TB / month. There's some manpower per month in there, and some egress costs, but it's still very cheap.


We moved off Azure LA to Datadog for this reason. Cheaper, better cost control options, and far superior UX.


It's funny this is coming up now. Just yesterday in GCP I was trying to figure out our billing and looking at what was costing such a huge amount. I couldn't figure out any way to map the actual service being used to the price on the normal reporting or even on the billing cost table export. The only way I could figure out how to do it was to enable log export. They used to have an option to download that as a file. They disabled that a while ago and now it's only available as a bigquery export. Which is exported every day. I was like "Why would that they do that ?" Oh cause now I have to set up big query and pay for all that. So I have to pay extra JUST TO SEE my detailed billing information. Pretty ridiculous.


Here's the link to the actual cost for viewing my billing data. https://cloud.google.com/billing/docs/how-to/export-data-big...

We really should revolt against this. I should be able to have a view of all of my billing without having to pay extra. It also shouldn't be hidden behind a bigquery export, it should be easy view what is being spent and what is causing it.


I used to work in GCP. The billing report UI in the billing account section shows per SKU usage. While detailed breakdown can be a nontrivial monster. Some customer may just launch thousands of VMs or data processing jobs per day.


Right that's correct. It shows SKU usage but not mapped to actual instance id. And we have the scenario you are talking about - lots of same sku with variable cost and no way to correlate it without using bigquery it seems


For all of the faults of Azure, they let you do this reporting directly in the Portal. You can slice and dice the data without having to spin up infrastructure.


The first 1TB processing is free for BigQuery, monthly.


Same story with Cosmos. On an IoT pipeline I set up with event hub, Cosmos needed $15k/month to keep up with the flow without generating 429s (i.e., causing the ingestion function to drop events). The RUs shouldn't have needed to be that high, but due to upstream providers there was a regular spike every five minutes that would exceed the average RU need. RUs are a hard per second cap; there's no average or windowing. I had to set my RU consumption cost at a level that guaranteed 40% unused capacity.

So I tried setting up the same ingestion database with Kafka Connect and Mongo on a $200/month VM. It worked flawlessly, and Azure helpfully suggested I downsize that VM because it was underutilized based on CPU statistics.

What incentive do the Cosmos engineers have make it more efficient, or to make the RU pricing model more reflective of actual usage? Zero. It's a money hose. Why would you turn that off?


I saw CosmosDB turn up in some recommended multi-region designs, and I had a customer with DR requirements so I looked into it.

I started by spinning up a small one in my lab but when I saw the pricing I back-pedalled very, very fast. Deleted the whole Resource Group and never looked into it again.


Would it not be possible for you to stream the data to a data lake and then take it from there to either do bulk inserts, or smooth out the inserts at a predictable rate to remove the peaks?


There were a variety of ways for me to do so by invoking more Azure services, but I stopped trying at that point, because even after smoothing it out, Cosmos would still be 50x as expensive as a basic VM running Mongo.

But also, every time I start stringing together cloud services, I experience two things: first, exploding complexity because now I'm adding points of failure, integrations, transformations, to keep it all running; and second, this sense of "the whole point of the cloud is to simplify things, to offer canned services and features that save me the trouble of doing this in code for myself." Once I'm using cloud features to work around cloud limitations, I bail out because if I'm going to spend that time (and money), I'm going to get the benefits of something much more direct.


internally it will probably be approved, for example I am sure that Google drive applies basic compression and deduplication to uploaded files, but if I upload 10 files of 10GB of zeros they are gonna count 100GB, not the few MB they are actually writing to disk

(there are good reasons for this, but still declared consumption is different from internal consumption)


Log Analytics uses a columnar compression format on-disk, so ingested data is likely compressed by anywhere between 10:1 and 100:1, maybe even higher.

However, the wire format is super verbose JSON.

They bill per GB of the latter, not the former.

To put things in perspective: How many $ of CPU time do you imagine it takes to column-compress 1 TB of data? I would estimate that a single modern CPU core could do this in a minute or so. Factor in various inefficiencies and make it a super generous 1 hour. At spot pricing, that's about $0.01! One cent!!!

The larger cost would be bandwidth. Azure charges a huge markup for traffic (just like AWS), so for example zone-to-zone data costs $10 per terabyte at retail pricing (not internal costing).

They store that data for 30 days "for free" (lol). Assume a worst-case compression ratio of 10:1 and then that means that they have to retain 100 GB for 30 days. That's $9.43 for a Premium SSD at retail pricing.

So their hosting costs for Log Analytics is something like $20 per TB ingested, but they charge well over $2000 for it.

That 100:1 markup is pretty sweet if your KPIs are based on recurring revenue.

There is no way in hell they will ever "optimise" this. Any accidental improvement will be rolled back or "adjusted" to ensure the revenue stream doesn't fall off a cliff.

Have you not wondered why it's taken them so long -- over ten years -- to enable any feature to filter logs at the source?


You can send logs elsewhere. We have them piped to various object storage or data warehouse services which also provides better querying.


I was all in on Azure for years, but once my BizSpark wore off and died I was shocked at how expensive it was to run the most trivial side project or seedling of an idea or non-money making project.

As a .NET full stack web dev for like 14 years, I finally decided to put all my new learnings into just doing static sites on React and trying to figure out some AWS stuff because I wasn't going to try and use .NET on AWS when I had it so fairly figured out on Azure.

For a serial entrepreneur and maker, it just couldn't cut it anymore. Now I do NextJS on Vercel with minimal extra services out of what they provide and I get way faster stuff for free pretty much and I guess I'm no longer the only .NET guy struggling to hold the fort in a big tech community that also thinks .NET is too old or boring or non-sexy.

I still do like Azure better than AWS. The stupid, weird UX is still nicer than AWS. The docs by MS are 1000x better than everything on AWS. I miss MSSQL and the SQL Server Management Studio, but I don't miss the cost for scaling it enough to actually use it for scraping or data processing.

I tried to sell friends on Azure and even got a part time gig from MS themselves to try and help local startups use it, but no one cared or was interested. It just doesn't have that same "standard" or "sexyness" or "built into every new tech" feel to it, so I doubt it'll ever really change or pull ahead in comparison.


I've never used Azure and always use .NET on AWS and have been for the past 5 years. I haven't hit any problem on it so far. .NET is not tied to a specific cloud provider.


Do you mean .NET is not tied to a specific cloud?


.NET is not tied to cloud at all. https://dotnet.microsoft.com/ lists Web, Mobile, Desktop, Microservices, Cloud, Machine Learning, Game Development, Internet of Things.


Edited my post above. I meant NOT tied


> Every cloud provider has their expensive “thing”. Ingress is always cheap, egress always expensive. AWS has their “Managed NAT Gateway”, after all, the sit-in-the-corner money printer that never fails.

This is my pet peeve with cloud providers. Each one of them seem to have a gotcha somewhere hidden.

It's very hard to compare what your final costs will be early in the project. You try to compare GCloud, AWS and Azure on VM, ingress, egress. After that comparisons become harder as services don't necessarily map 1:1. You end up choosing one and always find something else that adds to the cost you forgot to include, or maybe you just underestimated some metric.

Egress feels abusive pretty much across the board. Without really a good reason. Feels like they all sat at a table and decided to fix the price there.

While you are developing your business you find you want to use some feature (like managed VMs on Azure's case) that is priced way out of a reasonable amount. You feel robbed, maybe you can still pay for it with your budget, but even then it leaves a bad taste, like you are getting a bad deal.


> Egress feels abusive pretty much across the board. Without really a good reason. Feels like they all sat at a table and decided to fix the price there

Indeed. You can significantly lower your price for hosting static files by using an external CDN in front of S3 or GCS.

Little known fact: egress from GCS to Cloudflare is half the price than their usual egress fees. So combining the Cloudflare CDN for caching static files with this egress discount can lead to a 3-4x saving over just serving files out of GCS directly.


I wouldn’t call that particular thing hidden. You can roll your own NAT setup in a VPC, but that still costs money.


> I’ll still continue to use Azure

And this is the actual problem. Azure doesn't compete on quality. Microsoft rarely does. They have a few products and a market position that make them the default choice for countless customers. Nobody punishes them for their failings as long as the feature boxes keep being checked, and so the cycle continues.


A previous employer switched everything over to Azure and it was nothing but constant problems and nobody liked it except for the CEO, which meant everyone had to put up with it and nobody had a say.

Teams and Azure DevOps are some of the worst software I've ever used in my life. I've used worse software before, but it was hobbyist stuff written by single developers, and therefore don't really compare fairly.


At a previous company I used slack and really liked it. At this new company I joined recently they use Teams and it felt like a ghost town. Nobody online and nobody communicating in public.

The reason is that it’s impossible with the desktop app to browse channels that you’re not a member of, whereas on mobile you can! When I mentioned it to my colleagues they were shocked. This whole time they could have been communicating in other channels and they had no idea they existed. I have no idea if this is a bug, or some kind of admin setting.

There are plenty of other issues with DevOps too. You have to buy in to the whole Microsoft package apparently, and none of the parts are best in class.


Huh, I’ve used Skype for business before, and Teams is definitely better.

The problem with Teams is that companies use it to replace Slack, which is much more pleasant to use.


Azure Pipelines is just slow. I'm not sure even how to fix it.

Checkout stage of our code from Azure Repos easily count for half a minute. Npm install goes for a 1.5 minute with npm cache hit. Total build times are around 20 minutes...


there is an undocumented feature, called zip deploy, if you deploy to an app service. Give it a go, it might just work. I use it to deploy a Python Django website. You need to add a setting in your App Service environment variables.

It took me an escalation from our account manager, to get a good support guy, who informed me of this functionality.

Here is the summary of my support ticket:

WEBSITE_RUN_FROM_PACKAGE” and value as 1.

We performed deployment and it took around 30 secs. We also verified that the app is working fine.

Now you will do some changes and deploy once more to verify end to end pipeline.

Zip deployment is a feature of Azure App Service that lets you deploy your function app project to the wwwroot directory. The project is packaged as a .zip deployment file. The same APIs can be used to deploy your package to the d:\home\data\SitePackages folder. With the WEBSITE_RUN_FROM_PACKAGE app setting value of 1, the zip deployment APIs copy your package to the d:\home\data\SitePackages folder instead of extracting the files to d:\home\site\wwwroot. It also creates the packagename.txt file. After a restart, the package is mounted to wwwroot as a read-only filesystem.

Article for reference: https://docs.microsoft.com/en-us/azure/azure-functions/run-f...


Azure Devops is clearly not being actively developed, or there is no way it would remain so bad for so long.

Simple things like being able to sort tables un the web UI have had open issues for years.


What’s that logic anyway? “It’s hard to be a fan” sounds to me like you’re not really a fan, you’re just in a sunk cost fallacy. Bail out.


Microsoft is hardly unique in that respect. All three of the big cloud providers are like that in some respect, as well as a lot of "enterprise" software. If you have a big enough moat to keep your customers from leaving, there isn't a lot of incentive to improve the quality.


I sat in a market focus group for IT managers. We were universal in our rankings AWS > AZure > Google for cloud services. We also agreed the Google was #1 or #2 technologically, but no one trusted them to be there day after day for the boring stuff


In some cases it won't be easy to switch: if he is using it as a CI platform, then he will probably very invested in all his scripts.


We use Azure at work and this article hit home hard. As we just have been burned recently by Azure pricing. I am in the same spot as the author: liking Azure but being put off by all the weird stuff sometimes they are doing.

In our case, all we wanted was a static IP in front of an Azure Container Instance. Easy right? Let's put the container in a vNET, place a NAT Gateway in front of it and we are done. However, for some reason NAT Gateway is not supported for Container instances, instead, the official documentation suggests setting up an Azure managed firewall in front of your container that starts at a whopping 600EUR/month. That is a steep price increase from your ~30ish EUR/month for a basic container instance and it doesn't seem to be any other official alternative.

I have opened an issue with the docs team [1] about it and I hope there is another way of doing this that doesn't incur a doubling of our Azure monthly spending.

[1]: https://github.com/MicrosoftDocs/azure-docs/issues/81274


I don't think azure-docs repo is the right place to ask for help/suggestions as maintainers are not very responsive because their sole job is to push internal docs to public docs. But I understand your frustration.

However, I believe you could have set up "public IP prefix" using azure cli. I do not think you needs a azure managed firewall.

Adding managed firewall just to have edge IP is like saying I want to add a outside patio to my house, sure let's add a security check point for the neighborhood first.


This varies immensely by the product team. If you file a bug on the AAD protocol docs I have a self-enforced SLA of a business day (or less). And a CVP enforced SLA to solve it in 30 days. And I generally love the folks who do file bugs - they're engineers, and usually fairly savvy.

Other teams do get burnt out on docs though, especially when customers use them for abuse or free architecture help. My favorite was someone asking me how to use an Oracle product. I know our branding is confusing but it's not that bad... Is it?


> I don't think azure-docs repo is the right place to ask for help/suggestions as maintainers are not very responsive because their sole job is to push internal docs to public docs.

What place would you suggest? We had bad experience with Azure support we could never fight through on the first support line.

> However, I believe you could have set up "public IP prefix" using azure cli. I do not think you needs a azure managed firewall.

I don't have deep experience in networking stuff on Azure so my understanding can be wrong, but I think "public ip prefix" is just a group of continuous IP addresses what you can reserve. You still need to assign those to something eg a NAT Gateway. As far as I know you cannot assign them directly to an Container Instance.


[Microsoft employee here, speaking for myself]

>> I don't think azure-docs repo is the right place to ask for help/suggestions This is correct -- The azure docs repo feedback mechanism (using GitHub issues) is primarily for providing feedback on the documentation itself. We try to make this clear via the buttons on the bottom of the page; one is for 'Product Feedback' and the other is for 'Feedback about this page'. I would agree that the distinctions can be blurry, but I see the three categories as: - Product Support: I need help with a product - Product Feedback: Product A is missing feature B, and I want you to add it - Documentation Feedback: The documentation is unclear, has a typo, or the example provided no longer works

For Product support, your best bet is to go through the standard support channel. I'm sorry that you didn't get a better response when you tried contacting support. Do you have paid support? If you're a large customer, you may get a dedicated customer support account manager. Additionally, there are community forums including https://docs.microsoft.com/en-us/answers/topics/azure-contai... and https://techcommunity.microsoft.com/t5/azure-compute/bd-p/Co... , which can also be used to submit product feedback.


Dear lord, this hits too close home, I've been having nightmares maintaining some Azure infra lately. I'm not a cloud-provider fanboy, they all suck at the end of day, but Azure is the one that deliberately makes my life worse every day.

All those features and no decent integration between them, unless you're a multi billion 100k employee company you'll have no luck with their customer support either.


I had an Azure employee troubleshoot my PG db instance for hours with me, for free, while our total spend was something like $100 a month.

Now, this employee didn't really help, but they were obviously professional and had database experience and didn't act condescending / like they were doing us a favor at all.

They just worked through the issue with me, which was a very pleasant surprise.


> I hope there is another way of doing this that doesn't incur a doubling of our Azure monthly spending

Oracle Cloud Infrastructure provides NAT gateways for free. You pay (low) transit costs, but unlike AWS+Azure (idk about Google) the NAT gateway itself costs nothing, so you don't pay twice for NAT traffic.

All the traffic at these cloud operations gets handled by cloud scale SDN systems. I suspect the actual cost of the few tens of bytes necessary to track a NAT connection is lost in the noise of such platforms. So to my mind the high cost of some of these cloud operator's NAT gateways seems abusive.

Fortunately there is indeed competition that accommodates my view.


Could you not attach it to a subnet, and attach the subnet to a network security group, and then do what's needed in the network security group? Maybe there are regional restrictions that I'm unaware of.

Edit: oh, no, you just need a public ip prefix/address, right?


I'm surprised Azure still doesn't have any ARM processors to compete with AWS' Graviton instances. It's been nearly a full year since rumors of Azure working on ARM chips (https://www.zdnet.com/article/microsoft-is-designing-its-own...), and going back further it's close to 5 years since they talked up Windows Server on ARM (https://www.techrepublic.com/article/2-years-later-theres-st...) on Azure. Where is any of it though? I can go to AWS right now and spin up multiple different types of ARM processor instances, most of which are cheaper and more efficient for web-like workloads. It's really surprising that Azure hasn't been able to get anything out in all these years.


It's very possible that there just isn't much demand. This is likely for a number of reasons:

- x86 just has better support for the stuff Azure's "enterprise" customers want

- ARM servers are often more expensive to spin up than an equivalently specced x86 option

- PRISM compliance is easier on x86 (half-joking, half not)

I like ARM, and I owned a Rev1 Raspberry Pi when those were cool. But even now, ARM still has yet to make a strong case for existing on the server. And that's before we even discuss architectures like RISC-V that are out on the horizon, much better suited for servers than ARM. I'm not planning on an "ARM revolution" taking place in the next decade unless x86 is critically compromised in some way.


> But even now, ARM still has yet to make a strong case for existing on the server

This is several years out of date: AWS Graviton instances are usually a fairly substantial savings over similar Intel, with AMD in between, and Cloudflare has been reporting rather good numbers as well:

https://blog.cloudflare.com/designing-edge-servers-with-arm-...

The main reason I suspect Azure doesn't have it is both Windows' legacy x86 hyper-focus (the days where NT ran on half a dozen platforms never really panned out) and a smaller number of managed services. AWS has very popular services like RDS, ElastiCache, ElasticSearch/OpenSearch, etc. where you can simply check a box and wait a couple minutes to see savings, not to mention things like Lambda being only slightly more work for many users, and that's a great way to get volume usage even if the average enterprise IT department is scared to go near it for VMs.


It's not obvious whether AWS internal operation cost of Graviton instance is cheaper than Intel/AMD instance. I believe it's cheaper in AWS scale, but I also think they tactically reduce profit margin for Graviton instances, to negotiate Intel/AMD to cut down price.


That's certainly possible but wouldn't we see some indication in comparison with Azure, GCP, Oracle, etc. who would have no incentive to do so?


Is it cheaper for the end user? If not, what's the point of using ARM over x86 besides ideological reasons?


Yes, that's why they offer them and have done numerous comparisons showing this as a cost-savings move?


I don't know that's why I'm asking.

Every comparison I've seen was cost saving for the datacenter.


Do you mean end user as someone other than the buyer of a cloud service? That’s the context I was writing in - I’ve generally found things like https://aws.amazon.com/about-aws/whats-new/2020/10/achieve-u... to be accurate as long as you’re not radically switching instance types.


>Windows' legacy x86 hyper-focus

There have been multiple consumer ARM Windows products on the market for the last 9 years, I'd say Windows (client, at least) on ARM64 is as solid as its x64 counterpart


I’m aware, but everyone I know who had one of those ended up complaining about software compatibility issues. In the context of Azure, I’d imagine a lot of their customers would be especially risk averse in this regard.


Most of the time they're complaining about not being able to use x86 (in Windows RT's case) or x64 (until recently) pre-existing applications, and there's no native ARM/ARM64 version of said app. In the context of Azure, if someone wants to deploy on an ARM instance, I'd expect them to be able to build a native ARM64 version of whatever they're building.


> In the context of Azure, if someone wants to deploy on an ARM instance, I'd expect them to be able to build a native ARM64 version of whatever they're building.

Have you really had the experience that a large organization has the ability to recompile everything they run? Most will have a lot of code which is provided as binaries by a vendor, and their in-house code almost certainly has dependencies and optimizations which will need to be dealt with & revalidated. No, none of that is unsolvable but it means adoption is much harder than, say, changing an RDS instance type and you'd be taking on all of the support rather than the cloud provider's much larger team.

That's what I referred to in my original comment — in my experience, the average Azure user works at a Windows-heavy enterprise IT shop where those issues would be common. That doesn't mean that I don't expect ARM servers to happen there — Microsoft announced it was coming years ago, after all — but that it's going to be slow since the upfront investment will likely have slower adoption.


Microsoft themselves said the majority of VMs on Azure run Linux, not Windows.

About the ability to recompile, if you can't build ARM software what's the point of using ARM instances? You don't need cross-architecture compatibility like you need it on the desktop version, which is what consumer complain about when talking about Windows on ARM


The point is that it doesn’t matter if you could see a 20% price performance boost if your code doesn’t run on that architecture. If you’re using software which hasn’t been compiled for ARM, you’re not asking your cloud provider for that architecture and they’re not seeing the volume needed to profitably offer it.

I think AWS has successfully been pushing this because they know they’ll see that initial volume from people seeking savings on their own managed services and things like Lambda which are easy to switch, and that will fuel interest in switching other services which require more work.


Can you clarify point two? I was under the impression that the wide consensus was for Graviton instances to have significantly better price performance for most workloads. And the current state of arm support is surprisingly good for opensource or linux based server software.


As you mention they have stated they are designing their own for multiple platforms, but at the moment they are using Qualcomm processors for their ARM offerings [https://www.lightreading.com/enterprise-cloud/infrastructure....]


We use Microsoft Azure AD B2C to manage users. AD itself is a nice (for enterprise software), is used all over the place and is pretty stable.

B2C on the other hand is a different story. Every few months we have to roll out a new tenant in our system. Tenants are identified by B2C "applications". Every single time, the new application doesn't work. Every single time the fix involves editing the JSON spec, changing something random (like a "true" to a "false"), saving, and changing it right back again.

Doesn't exactly inspire confidence... we're planning a migration to AWS.

Also, try Googling for documentation related to "Microsoft Azure AD B2C". Almost every shred of internet wisdom is related to AD and not B2C. Even with Microsoft's own documentation you sometimes follow a link from a B2C API reference and find yourself in AD-only land and it isn't obvious. This makes the task of researching features and debugging infuriating.


B2C is the worst product we have ever worked with. Outages on constant basis, an extremely complex XML configuration, translation bugs, unsupported features which are only available in AD but not AD B2C (e.g. M2M), and really bad documentation. We basically had to dive in their github examples and issue tracker to make it work. Do yourself a favor and stay away from it.


Maybe Bing search would be better for MS docs? (ducks)


I got a full-blown belly laugh out of this, thank you.


If you run workloads in Azure, and you let their default agent run on your images, I'd highly consider you take 30 minutes to skim around this repo: `Azure/walinuxagent`.

Go read through some issues, look at some closed ones, try and skim through the source code. Realize there's two enormous python scripts in the repo, one with "2.0" tacked on the end.

If Azure is somehow not just rebooting/killing VMs that lack the magic handshake, I'd highly recommend dropping the agent.

After all of this news, and what is on display in walinuxagent, do you really want some network-connected agent listening, who's often-most-touted feature is being a persistent backdoor?

"It's tough being an Azure fan." Eh, it's a nice cyclic problem. They have no depth of caring about engineering (hence why Azure is littered with services that are impossible to fully utilize because their own engineers don't understand the how/why of what they're building half the time). Which in turn, along with crap career advancement and constant un-appreciated, unmitigated live-service burnout, is why they can't retain actual Linux talent to save their fucking lives.


As a network engineer writing code to automate aws and azure infrastructure I'm amazed at the number of pitfalls we've had with azure, they just keep popping up. Then the whole basic/standard options for every service. Feels like an order of magnitude higher in complexity for no reason, part of it is the abysmal documentation.


This has been my life for the last year:

- Customer asks for something trivial to be deployed. I say, "no problem" and start beavering away on a Bicep template or whatever to deploy their stuff.

- I hit some small but showstopper issue with a service that I expected to work, but it doesn't. I open a support ticket.

- Inevitably, it turns out to be some stupid, stupid limitation caused by unfathomable laziness of the Azure developers. No workaround, no mitigation, but we're "working on it" with no ETA offered.

- Literally years later a trivial fix for the glaring issue goes into "PREVIEW" for 9 months, long after the original project was closed. I no longer care...

Networking is especially bad, with endless limitations that make no sense, like:

IPv6 is incompatible with everything. Turning it on for anything anywhere in a vNET will permanently block unrelated features like IPv4 NAT.

But of course, you can't "protocol translate" from IPv6 on the outside to IPv4 on the inside, so you end up painted into a corner.

No bring-your-own-subnet, which means many lift & shift scenarios are impossible (we have customers using a class B pubic range internally).

Azure forces NAT on IPv6, which makes no sense at all.

All Azure PaaS services have firewalls that are IPv4 only.

The built-in firewalls (e.g.: Azure SQL Database) do not support service tags, only CIDRs.

I could go on and on...


Yes, seemingly trivial changes turns into huge and disruptive changes. Latest one we encountered was that you cannot add more CIDR ranges to a VNET without removing all VNET peering first. The feature for this has been in preview for like 2 years. I mean, AWS doesn't automatically handle this for you, but at least you can solve it easily enough (just use the API to create routes on the other side (their API/boto3 library is just amazing)).


Availability Sets decrease availability, because they force "all or nothing" deallocate-reallocate cycles instead of one-at-a-time changes.

You can't move a VM to a different subscription without also moving the entire vNet along with it. (I bet this worked great in the developer's lab with one test VM.)

You can't move a VM to a different backup resource without deleting all backups, going back years.

VMs can't be rolled back to VM snapshots.

ExpressRoute bandwidth can be increased non-disruptively, but the only way to decrease it is to recreate it -- incurring a ~30 minute outage.

The Activity Log (and most other audit logs) don't log the identity of the administrator that triggered the action about 10-50% of the time, depending on what kind of activities are going on. This makes it 100% useless as an audit log. There is no other way to obtain audits.

Most network resources are Zone Redundant, except for NAT Gateway, which is now required in some scenarios. It's Zonal only, which means you need 3 subnets, one for each zone.

Upgrading an internal load balancer from Basic to Standard cuts off Internet access, for "reasons". I'm told these are "security reasons". Uh-huh...

App Gateway doesn't use a "user-agent" header for its monitors, which makes it incompatible with a surprisingly wide range of CotS software. This includes a bunch of Microsoft Software.

I could go on and on...


> Inevitably, it turns out to be some stupid, stupid limitation caused by unfathomable laziness of the Azure developers. No workaround, no mitigation, but we're "working on it" with no ETA offered.

This is all microsoft products though. There have been .NET Identity bugs that have been open and acknowledged for multiple years.


I find this to be a common theme with Microsoft products. I work heavily with SSIS and MS SQL. The Microsoft docs are so convoluted they are borderline useless for learning the products.


My friend and I have a theory that this is because their docs are written by people so intimately familiar with how MS does things, they can no longer conceive of a situation where someone doesn’t already understand and think in the same way.

It’s like the business equivalent of the “once you understand a monad you can no longer explain it” meme.


It's Microsoft's modern, less-obvious version of EEE.

You sell to the managers and equip them with hollow buzzwords because you know they're gonna override the engineers on every decision anyway. By then, MS has your firm's money and all you can do is deal with it.


Or the ole “I tried to explain K8s to a friend, now we both don’t understand it.”


I find the problems with Microsoft docs is less how convoluted they are (you kind of get used to the convention at some point - not to say it's good or sensical), it's the fact that Azure documentation is not of the same quality that the WinAPI docs were.

Used to be a function doc would tell you:

1. All the parameters and their types 2. What they did, explained 3. All possible exceptions raised by the function 4. All possible return values 5. Supplementary documentation on the object or structs passed in or passed back 6. Some examples of it in use

Now, the current docs sometimes have examples for Azure CLI/SDK stuff, but there is one convention that drives me bonkers that is as bad now as it ever was.

For examples, often times you'll get the most important part replaced with a [insert your thing here]. The format of the thing you fill in there is often left as an exercise to the reader to intuit or guess.


Writing one good set of docs that are suitable for all users (experts and newbies) is really hard. Structuring complex topics in a logical and progressive manner is also really hard. Having tried to write the docs on a few software projects, I honestly found writing the docs harder than writing the code (though maybe it gets easier with practice).

I find the Azure docs to be decent. They are consistent across services for the most part, so you learn how to work with them after a bit of experience on the platform.


Same here, and not even GCP is that bad with the 'release' vs. 'beta' APIs that intermingle constantly.

Most of the time Azure feels like just another 'we have virtual machines and a crappy API' service. Kinda like a pretend-cloud where a lot of products come almost together but never finish. Almost similar to the way Windows and backwards compatibility means you end up with 100 libraries, frameworks, languages and versions all sitting side-by-side and not really working together, just differently in parallel. Not useful for automation at all, which makes it not useful at scale. (Except when scaling means: we want to run windows VMs with AD and go from 10 to 100 and change no parameters at all)


My theory is that all MS products are ultimately just a layer based on SharePoint :)


That reminds me of a relatively old twitter thread (that I can't find of course) about systems that have become operating systems and computers upon themselves.

It went something like this:

Code used to be running on a processor, then it ran on an OS on a processor. Then it became on a runtime on an OS on a processor. Then it an abstraction layer was sandwiched in between. Then a filesystem. Then a compatibility layer. Then a database. Then a browser. Then we went and created a runtime in the browser to run code again. We have come full circle.

This probably applies to SharePoint but also the real-time OS on your GPU, the MSSQL database which has its own OS facilities you can run stuff on. Add too many features and the application becomes the very thing we wanted to get away from...


Yeah. Azure is a second- or third-tier cloud provider; they are not in the same league as AWS.

AKS suffers, AFAICT, constant API server outages. We tried to escalate into a ticket, but we just get motte & bailey'd between "you're putting too much load on the API server" — okay, what load? how can I see that, control it? — "here is the top consumers" — they're all AKS itself? — "well, there's too much load on the API server" gah! (Yes, we pay for the "SLA".)

You can't add IPv6 anywhere in a vnet, it will break unrelated things. We tried to add a managed PSQL server on IPv4 (b/c IPv6 is not supported): it "failed" (the API call to create it timed out with an internal server error … after 2 hours or so!) because something unrelated in the vnet used IPv6.

ACR has a 20 TiB limit, no way to prune containers, we've had to work around IDK how many 500s, the API is slow as dirt (response bandwidths of ~50 kbps — bits — and their team does not think that's a problem. It can take 10 minutes to enumerate a few megabytes of metadata…) Undelete-able manifests that I guess we will just pay for indefinitely? I feel like I could build ACR on top of Azure Blobstorage and it would be more reliable with better performance…

VMs shipping with buggy kernels. (Support wanted to know what weird thing we were running to hit kernel bugs. "Docker"?) Global outages. Known outages often don't get mentioned on the status page intentionally. I still don't know what the difference between an availability set and a VMSS is.

Everything is in preview. Everything.

Audit logs fail to load some times. Beyond complicated interactions/differences between "service principals", "applications", "enterprise applications". 2FA app now requires authenticating twice, per login, because each tenant acts as a separate yet not separate login. So. much. auth. Role assignments that don't know what principal is being granted permission, b/c ARM & AAD don't do referential integrity.

Docs that are outdated. Requests for updated docs closed without update, because "we don't have the data <from some other internal team, I think, but so?>"?. Docs that describe API calls badly. "foo: the foo query parameter" No docs, at least, that I'm aware of, about what permissions does the API call require. The docs conflate "permission" and "role" (different in Azure) all. the. time. Azure doesn't know what permissions some calls require, and simply says "give it Contributor" (close to all permissions)… and yeah that works but I want to show auditors we're doing PoLP?

Support… The SLA often isn't met, we're assigned reps in China (n.b., this isn't a language barrier problem, it's, how is someone who is literally sleeping during my business hours because that's timezones for you supposed to even meet a support SLA that wants a 8/4/2 hour response? And AFAICT from response times, they're not a night shift…?), the first response is worthless (doesn't answer the inquiry, requests information present in the original request, often isn't technically proficient, etc.), the writing is broken and sloppy. Half the time a simple "reread what you're about to send. Does it solve their problem?"… we literally got a blank email back. We've had tickets where the first response we get is "we haven't heard back from you" because sometimes their responses fail to get linked into the ticket in the portal. They lack any formal bug reporting mechanism, and support is not equipped to handle bugs.

My God, it's full of bugs.


Azure is a typical Microsoft product: it doesn't work all that well, but you can get the job done if you just buy more of it (to work around the deficiencies).

From a business perspective, this is brilliant. From a technical perspective it's not awesome, obviously.


Bill gates.


Pretty happy Azure & DevOps fan here. Been using it since day-1. I've helped build large platforms at several companies completely on Azure with great success. They were all small teams that shared all responsibilities across FrontEnd/BackEnd/DB/DevOps.

Our current project uses what I feel like are pretty standard features for a SaaS app. They include C#/.NET Core, Linux App Services, SQL Server, FrontDoor, SignalR, Functions, etc. Functions is the only feature that has been bumpy for us deployment-wise.

The journey of the various portals has been fun. The current one isn't perfect but it's far better than my experience using AWS. It's got to be a challenge to organize such a massive portal developed by so many teams.

The effort MS has put into documentation has been really great as well.

That said, there's a lot that could be better. A lot of the PMs are on Twitter I tend to be a squeaky wheel there about various problems so hopefully they're listening.


I have had the same experience - the entire package works really well for a small team. We had a really easy time configuring our builds, but we don't do anything fancy - install things, run unit tests, create a build artifact.

Haven't had issues with Azure functions, though DI is wonky in them if you try to build anything complex (imo you shouldn't). Other than that, I guess I just don't use them often to really understand what all the boilerplate does. Finally, when we switched to Python from .NET, it didn't feel like any of the function knowledge carried over somehow. Felt like developing on a different platfom.

I also HATED the Azure certification exam compared to GCP. GCP tried to actually teach you something practical, Azure tested a bunch of memorization that you can google.


There’s a nice bug with azure functions where it reports the environment it’s running in as “development” when it should be “production”, which means the app uses the wrong settings.

Nothing in the documentation to mention this, you just need to deploy and learn from your mistakes!


I remember this one!!!! I spent about a week and a half trying to figure this one out. Holy shit. Has your trauma waned off?


Fortunately the impact to us was quite small, but it's definitely left a sour taste in my mouth as far as azure functions go.

The whole story regarding organizing the code, breaking changes between versions, confusing plans, different styles of configuration between that and aspnet web apis, trying to configure individual functions in the same app without affecting others, etc. is not good!


My experiences has been generally positive too.

One thing I do find is that although their documentation is now open-source, it can still take a long time to get changes reviewed and merged (if at all). I waited about 3 months once for a simple change to documentation that was clearly wrong and by the time they looked at it, the docs had been re-factored.

They need to learn to embrace the Amazon Turk and have the right people review documentation edits. Should be able to quickly check a proposed change and then just merge it.


Not directly Azure, but related: OneDrive Business (which is built atop Azure, I think)

I recently moved off Amazon Cloud storage to OneDrive because Amazon didn't support rclone. Microsoft's OneDrive is quite a bit less expensive than either Google or Dropbox or Amazon.

The storage is there but managing it - what a mess it is!

First, there are like a gazillion URLs to access:

azure.com onedrive.live.com admin.microsoft.com microsoftonline.com office.com office365.com onedrive.com onmicrosoft.com sharepoint.com windowsazure.com

Each one of those portals take you to SOME view of your account with a gazillion settings. Many of them are repeated, and changing it on one portal doesn't necessarily reflect in others. For example, I enabled 2FA (I seriously don't know how or where - but I was able to login using the 2FA), but going to admin.microsoft.com showed 2FA disabled for the user - go figure!

Something as simple as figuring out how much space is currently used on your OneDrive is a challenge. There's a set ritualistic series of incantations and clicks that will get you there, but you really need to be persistent. Googling for it gives you an answer, but most answers will lead you to live.com which is only for personal accounts and not business accounts - it won't allow you to login.

office.com/launch/forms takes you to, and allows login to your business account, but going to forms.office.com (which is the top search result in google) redirects you to live.com and doesn't allow login to your business account!


One thing I want to say is that while the clouds are frustrating at the end of the day, in-house premise solutions sometimes share the same caveats (Little support for small users, gigantic mazes of solutions with little doc, etc.). I think once you reach certain level of complexity these issues pop up eventually.

One major difference perhaps is on the expenses, but are you sure you are hiring the right number of engineers and paying the right amount of $$ for those software and machines?

To provide some context: I figured eventually the configuration as code will reach the point that essentially people start to implement their own DSLs. When new people join they are easentially config boys/girls who needs to spend a lot of time to learn and unlearn a few DSLs created by the smartest guy/gal in the company who moved on to greemer pasteries...


I don't think being a "fan" of tech products/services is healthy. Having a preference is fine, but loyalty won't work in your favour.


It's a good article, check it out. It's really just about being a heavy Azure user and considering whether to recommend it to others, with the bulk of the article being a critique of various issues.

"fan" is just in the title.


The mention of AWS ECS made me curious — does Azure have the equivalent of Fargate, where they manage the host and you only pay for the container's actual usage? The O&M wins on that have been really substantial and I recommend it for anyone who doesn't have sufficiently large scale that the savings will fund an ops team capable of running something like ECS or Kubernetes. One of the reasons why is the challenge of avoiding over-allocated instances — every time I've seen people running AWS ECS/EKS, Google GKE, etc. they've been substantially over-provisioned because someone's always meaning to get around to looking at that but the time never seems to materialize.


Their closest Fargate equivalent is probably Azure Container Instances. I haven't tried it recently but last time I did the startup time was very slow for new containers (5mins+). However that may have improved now.


Thanks — I'll have to take a look at that sometime.


You can do Fargate with EKS now too, but it has sizing limitations that aren’t present in ECS last time I tried.


Yes - and they had limits with EBS, too. It’s not perfect but it’s really handy for cutting out maintenance for a substantial fraction of my tasks which don’t hit the edges.


I miss my on prem data center. Life was so easy then. I dont even need scalability but my dumb bosses think the cloud is cool.


Yeah...and if I have to sit through one more week of managers and devs freaking out about lambda cold starts Im probably going to quit tech altogether. It is the most dumb ass broken record. Torture. Read the manual before you write two hundred functions and bet the whole project on fantasies next time.


My company once took a project on Azure, and ended up giving a 35% cut on our invoices to move to AWS. The bugs I filed while working on that project are still not closed.

That was 4 years ago.


I always wonder if going all-in on cloud services actually saves most people time, after hearing about all the weird corner cases that need debugging. Because leased dedicated hardware really doesn’t take much time to manage, it’s entirely fungible and easy to switch off of if you’re not happy with support, it can support an enormous amount of traffic, and it does it for relatively low cost (especially if you need eg high bandwidth).


> I always wonder if going all-in on cloud services actually saves most people time...

Agreed. The apps and services I manage aren't very big compared to a lot of posters here, and my company is certainly not going to ever pay the big bucks for real talent.

So instead, I have to design everything knowing that my ops team is going to be mostly $100K / year "devops" guys with a few "cloud" certs but no real CS, dev, or even Linux knowledge (yes it's that hard to hire good people now (at the low rates my employer wants...)).

I've gotten to the point where I absolutely mandate that they don't try to use Terraform or CloudFormation scripts, because in the end, there are so many edge cases or glitches that it's easier to just write an install guide that shows them which buttons to click in the AWS or Azure console. <sigh>

And when I look at the costs we spend per month - including all the unanticipated charges like NAT Gateways and $20 / day "managed" Postgres instances, I assume we'd be better off dumping the cloud and reverting to our 2008 setup: Spending $10K on some Dell servers in a managed data center and hiring an old-school Linux admin to manually install and manage it all.


It doesn't seem to. My current place is all in on AWS and we have about 6 "DevOps" people dedicated to dealing with it all.

Last place we did pretty similar stuff with 3 data centers, all self hosted, self managed, mostly OSS stuff with about 6 sysadmins and way less hassle.


I think Azure suffers in the same way as MS development tools generally, they are racing to try and make so many new things that their "legendary" backwards compatibility feels more like neglect than support. Sure, Webforms still works (which is nice, 1000s of companies have legacy apps) but they still haven't fixed the problem that nuget installs reformats web.config even if it doesn't change it, meaning you have to delete all the redirects, get VS to add them back in and only then do you see the actual changes.

I would see that as a "Major" bug because it is so unecessary and been there so long. MS's attitude? Bit too hard to fix, just use dotnet core instead, like we can all just do that with our legacy production apps!

VS is also pretty cool function-wise but still too many lockups, cache corruption causing strange compiler errors and files left locked after exiting but instead of doubling-down and refactoring the core code to work properly (I think that is a "thing"), instead they kind of start using VSCode instead even though it doesn't have half the functionality of VS but don't make VS become the thing of beauty it is supposed to be.


This used to be very different one or two decades ago, when Microsoft's backwards compatibility meant you could migrate your existing code to the new thing with minimal changes. This era is where the "legends" come from. Nowadays, it's exactly as you describe, unfortunately.


Agreed. Now, they'd make "Doors" (which doesn't run anything from Windows) instead of Windows 95, just leaving Windows 3.1 to rot as is. It's more likeways "sideways-incompatibility".


I find Azure to be expensive. Microsoft is not accountable for service issues and bugs. Support is just a wall of consultants that mostly cannot help and then reach back to actual Microsoft for help (which makes everything really really slow). Garbage.


The tech is definitely third tier. And if you ever need to actually talk to someone over there, it's a nightmare.


I always dreaded talking to support - AKS would routinely break in weird and ways, and it was always super frustrating waiting for support to fix things that shouldn’t have happened.

We once had an error with Azure MySQL and AKS, where it would just stop dropping packets with basic instance types. Support could never fix it, we ended up just upgrading to standard because that worked.

I will say this - the manager of the team at Azure did comp our upgrade, because they couldn’t figure out why it was happening. I tried very hard to get our project off Azure, but because it was negotiated as part the Enterprise Agreement with Microsoft, it was “free” so the CTO wouldn’t budge. I assume this is how many people end up needing to deal with Azure.


Yes. We found a problem with Azure’s IPv6 support a while back.

It took quite literally months to get the issue escalated to someone who could do anything about it, and as far as I know it’s still broken, although it’s scheduled to be fixed this year. We had to pay for this “support” too. We worked around the problem in the end, but still.


I work at a large company but I find Azure wants to talk to us too much.


It always depends on your scale. I was at a company that had a month spend in the 6 figures and they wanted to constantly talk. A different company I was at the spend was 3 figures a month, and we couldn't get any help at all, even knowing some internal folks in different areas of the company.


We don't get much spam from Azure reps, but APM/metrics/logs companies like DataDog really are trying hard to schedule free-of-charge "presentations" with our teams. Unless they are faking out MTA headers and mail client details, the mail templates are actually edited and sent from a desktop by an actual sales/advocate/devrel guy. Yuck.


They want to talk, “ connect” and “align” all the time but not really help with issues you may have.


Right but the quality of the conversation is so low it's a waste of time, hard to find the right people. It's like talking to a borg cube


> And if you ever need to actually talk to someone over there, it's a nightmare

I never understand it when I see people say this about Azure. Did you ever try creating a support ticket through the Azure portal? My experience has been nothing but stellar. They respond very fast, they provide very in-depth expert knowledge, and they seem to be quite thorough, escalating any issues if needed. This is my experience with creating tickets for various organizations, both big and small.


Am I the only one who thinks it's incredibly suspicious that the folks that "discovered" the Azure security issues were the same guys who were responsible for Cloud Security at Azure? https://www.wiz.io/press-releases/wiz-emerges-from-stealth-w...


Suspicious as in they might have left Microsoft knowing about these issues, or that they're a plant by Microsoft?

It's quite likely they had insider knowledge and therefore had a better idea of where to look. But I'm not sure that makes it "suspicious", as opposed to... well, what else would you expect in those circumstances? Or are you saying that the unfair advantage of insider knowledge is what makes it suspicious?


At a previous job I was tasked with writing our bug bounty policy. I was also in charge of running and reviewing all of the SAST and DAST scans. It was tempting to monetize that potential conflict of interest, especially after I left the company.


I’d say my limited experience (in prod, with only like 50K a month bills) AWS and Azure are functionally identical.

Anything I need to do, that I haven’t done before (or recently) requires roughly the same spin up time to get familiar with how each provider functions anyway.


Don’t know if this is still the case, but I always felt more nickled and dimed on Azure (than AWS). For example, the base SKU for their managed PostgreSQL compatible service did not allow use in a VPC/behind a DMZ. You had to pay substantially more for the mid level SKU. This was not the case with AWS’s equivalent service.


Also, I may have missed something, but cloning VMs in Azure used to be (still is?) a destructive process. WTF?


How could it possibly be a destructive process?

Do you mean like cloning VM A destroys it and creates identical VMs B and C?


I think what was happening was, if you wanted to make a VM image (which you could then later clone to new instances), it would result in the original VM (i.e., the source of the image) being no longer usable.


Did you try Azure Flexible server offering for PostgreSQL? It is in preview and addresses this exact issue.


No, I haven’t used Azure since changing jobs two years ago. Thanks for the heads up.


The AWS cli has much the same problem as the az cli. Especially v2, which bundles the python runtime as well.

Not that that is an excuse for azure of course (or conversely the fact azure does it isn't an excuse for aws).


Are the security issues (that are fresh of mind!) just Microsoft being open and proactive about... security? Is AWS this open?

Everything else could have been written about AWS as well. They all feel duct taped when you know what's going on behind the curtain. Enterprise developers spend a crapload of time messing with infra on AWS. I think it all sucks. Unfortunately its better than nagging the BOFH for a few VMs (which you are only going to get this month if he knows you AND likes you, AND you can keep info-sec off everyone's back).


Just today I got this notification after submitting this broken link issue through the "Feedback" button at the bottom of the docs page a month and a half ago: https://github.com/Azure/azure-sdk-for-python/issues/20249#e...

I wish the people in charge of these docs cared a bit more about their quality...


I’ve had what can only be described as the unfortunate displeasure of using Azure.

Unnecessarily complex, arcane and frustrating is how I’d describe basically everything we had to do.

The fact that they ship a whole Python runtime is basically the least shocking thing I’ve learnt about Azure.


The Azure web console is light-years ahead of GCP's console, it's so much nicer to use.

GCP's console is just stupid though. It takes so long to load, and instead of just making it faster, they made it load in incrementally, so you just have to sit there waiting for whatever widget you actually want to use, load in correctly.

And even when it has loaded, the entire thing is extremely laggy.


I’ve been using Azure since the old portal and it always feels like Microsoft is just checking the box. AWS launches some innovative new service and Azure copies them with a half-as-capable alternative.


Azure is and will continue to creep into GitHub. Pretty sure my new projects will be on GitLab now just to get a headstart on using who the cutting edge service is a few years from now.


The biggest issue is that I've experienced some of that creep with GitHub Actions. It's a mess, doesn't do what it needs to do and besides "presenting" features/functionality it can't actually do even the basic things a nasty old Jenkins installation in an AWS ASG does.

They practically just copied some Azure CI stuff over (which was obviously written from a completely different perspective/mindset) using a tech stack that is old yet immature when compared to even relatively young products like GitLab CI.

If a company wants to come up with a CI product but can't even get on par with basic nominal GitLab CI features and usability, what's the point...


even from a documentation perspective, GitLab CI is so much easier to digest then azure pipeline docs.


Azure pipelines is the most verbose and confusing CI tool I’ve had the misfortune of using.

Why does it require so many steps? Why are they so verbose? It feels like writing glorified bash scripts (in which case why am I using a CI tool?) but worse.


“Old, yet immature”. Perfect description.


Bafflingly there still isn't an Azure service to send email, like AWS SES.


I spent ~6 months getting them to properly delete a unprovisioned, un attached, properly shut down disk.


Deleting any resource has a 70% success rate

simply mindboggling


Judging from the stories posted here, can you even imagine what the stories from the poor souls working on this would be like, if only they could tell us without violating an NDA? Must be the stuff of nightmares.


I've never heard a single good thing about Azure, and now that I have to work with it I understand why.

The most egregious thing (for now) was an Auto Scale failure. For an entire business day the autoscaler failed to spin up VMs which effectively killed our product for the day. We could not scale manually either. No logs, no errors, absolutely no idea what happened. Support tried to convice us for an entire month that it was our own fault for misconfiguring the autoscaler by quoting the documentation that was saying another entirely different thing. After insisting for a month the issue was finally escalated to someone who could read the logs and immediately see what went wrong: they had run out of VMs and could not scale up.

So we have :

1. Unreliable services, breach of SLA.

2. Zero ability to anticipate, prevent, debug, or ensure the problem will not happen again.

3. Incompetent support lacking basic reading abilities trying to gaslight me.

4. An obvious issue that should have raised alarms long before it arose.

Thanks Azure.

This week I spend half a day trying to understand why Azure would not create an Application Gateway using a configuration that was identical to another one we had already (do _not_ attempt Azure without Terraform). Turns out it was another outage with absolutely no way to know it. Best part is the official fix: "Please, retry in case of failure during the deployment." It takes 30 minutes to create said resource before it enters a failed state. XKCD 303 applies.


It's been about 10 years, but Azure lost one of my volumes with important data on it out of nowhere, and refused to offer me any support until I made a huge stink in the support forum.

When I finally got my data back, I moved to DigitalOcean and haven't looked back.


It's a mess from an admin standpoint, but I like to think most of our issues are from running hybrid with on-prem

Also, lots of stuff doesn't worth with GCC High


another team in my org is trying to configure the same but it's been challenge. I don't know specifics but it's taken considerable time to set-up the environment.


2010: Move to cloud hosting to save yourself from scalability hassles.

2020: Move from cloud hosting to save yourself from surprise billing hassles.


Is it even a trend? I still see org are moving everything to cloud despite the much higher expenses. Beside Dropbox II didn't hear any other story https://www.techrepublic.com/article/dropbox-saved-75m-by-du...


It's slowly starting to trend because of three realizations backend developers are having: (1.) Although scalability is hard, distributed apps are really hard, particularly distributed debugging and logging. (2.) Monolithic servers are now powerful enough to host an entire billion dollar internet business with millions of simultaneous users.[*] (3.) Cloud providers make their huge profits in part by carefully engineering their billing to maximize how much they dark-pattern surprise-bill their clients the moment they have unexpected growth.

[*] In early 2010s, WhatsApp could handle 2M connections on a single 1U with Erlang; MigratoryData could handle 10-12M connections with Java on Linux. Epyc servers are even more powerful now.

https://en.wikipedia.org/wiki/C10k_problem


> Azure often has a feeling of being held together with a lot more sticky tape behind the scenes than I would want to know about.

This is my feeling. And I think it makes sense - vast swathes of how Google and Amazon work already rely on cloudy engineers (although they were there before cloud was a thing). Microsoft has much more split focus, with Windows, SQL Server, AD, Dynamics, Xbox, and Office all being large areas of engineering specialism that in no way feed into cloudy areas.

Bing and Xbox Live do, I suppose, but I bet they're built in a totally different way to both each other and, say, Teams, which I think is k8s in the back.

So my theory is there's much less of the company that does cloudy things, and even when they do (e.g. Bing, Xbox Live), there's probably a lack of a commonality of technical approach compared to what Azure exposes to customers.


>> Conclusion -- I’ll still continue to use Azure, but the proposition is becoming weaker

I do appreciate the detailed list of current failings, to add to my own systemic rejection of this system based on their management objectives

you are under an illusion, you are REQUIRED to use it in service of The Company


It's not easier to be an AWS or GCE fan, though. I'd love to go cloud computing for my personal stuff instead of physical servers - but I don't want to go AWS because I don't want to get blacklisted from Amazon shopping or losing my Kindle library due to hitting a runaway AI flagging my cheap EC2 instance as "potentially fraudulent/hacked" or whatever, and for Google... well, losing that account would be way more disastrous.

Can we just have a decent cloud provider offering services on a reliability and scale level as AWS/GCE/Azure, but with decent customer support and no way for (necessary) anti-fraud/abuse mechanisms to kill off entire digital lives?


We moved EVERYTHING to Teams and ADO (for git, backlog, pipelines) except our actual product is still on AWS. If we didn't have a couple things like encryption, S3 and some fargate usage, we'd probably start fresh on Azure.


My latest favourite issue with Azure is Azure DevOps. It can randomly add extra quites to your bash script variables for w/e reason.

Azure Support is aware of the issue, it happens to many of their clients, but they have no idea why it is happening.

Hilarious. But still managememt was sold a vision of the #1 Cloud, so here we are writing more and more bandaids on issues Azure has.

Another one from the top of my head is Scheduled Events for AKS nodes. Azure provides you with a managed nodegroups solution, but has no project to support those events and you have to rely on not-supported community projects. Which often die simply because of lack of interest in Azure.


Azure Primitives (Compute, Networking, Storage) are a dumpster fire. Paas Services can only be as good as these primitives. Friends don't let friends use Azure. Just don't.


The security issues are very worrisome.

They're the type of mistakes where it looks like there's no process or culture in place. Not a good thing for a cloud provider.


I've never used this az tool that is being complained about in all the years I've used Azure. The Azure powershell cmdlets are ever-changing and sometimes frustrating , but theyre certainly not bloated to the tune of a 1 GB download.


My problem with Azure is that stuff is constantly being rebranded, 3rd party integrations being added/dropped without notice, etc.

And all of that without a changelog... I just run a change detector on a bunch of Microsoft docs urls.


All of these arguments are presented weakly and self-admittedly compared to similar AWS failures in every instance. Maybe rename the article "Cloud Computing isn't a Magic Bullet".


We are on Azure, and migrating away from functions, Azure Sql, log workspace to just VMs and kubernetes


It's tough being an AWS fan for that matter too. Aside from the simplest things, every time you need to do something there are huge reams of "documentation" in various stages of decay, and each doc is comprised of dozens of (unnecessarily) manual steps. Error messages are shit, too, and the whole thing has this stench of massive engineering debt and duct tape to it. I was told this stuff was supposed to be built by people who know what they're doing, but apparently not. It's duct tape all the way down it seems like.


Yeah, when I finally switched over to using it from Azure I was shocked at how horrid the docs are. I would have assumed by now they would have been cleaned up, but no, time does not heal these wounds.


The individual services maybe are built by people who know what they're doing, sure. But the aggregate is a mess because Amazon ships its org chart.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: