Notes on Theory of Distributed Systems [pdf]

phtrivier · on Aug 24, 2022

Is anyone teaching "Practice of boring Distributed Systems 101 for dummies on a budget with a tight schedule" ?

As in, "we have a PHP monolith used by all of 12 people in the accounting department, and for some reason we've been tasked with making it run on multiple machines ("for redundancy" or something) by next month.

The original developpers left to start a Bitcoin scam.

Some exec read about the "cloud", but we'll probably get just enough budget to buy a coffee to an AWS salesman.

Don't even dream of hiring a "DevOps" to deploy a kubernetes cluster to orchestrate anything. Don't dream of hiring anyone, actually. Or, paying anything, for that matter.

You had one machine ; here is a second machine. That's a 100% increase in your budget, now go get us some value with that !

And don't come back in three months to ask for another budget to 'upgrade'."

Where would someone start ?

(EDIT: To clarify, this is a tongue in cheek hyperbole scenario, not a cry for immediate help. Thanks to all who offered help ;)

Yet, I'm curious about any resource on how to attack such problems, because I can only find material on how to handle large scale multi million users high availability stuff.)

keule · on Aug 24, 2022

> As in, "we have a PHP monolith used by all of 12 people in the accounting department, and for some reason we've been tasked with making it run on multiple machines ("for redundancy" or something) by next month.

Usually, your monolith has these components: a web server (apache/nginx + php), a database, and other custom tooling.

> Where would someone start ?

I think a first step is to move the database to something managed, like AWS RDS or Azure Managed Databases. Herein lies the basis for scaling out your web tier later. And here you will find the most pain because there are likely: custom backup scripts, cron jobs, and other tools that access the DB in unforeseen ways.

If you get over that hump you have done your first big step towards a more robust model. Your DB will have automated backups, managed updates, rollover, read replicas etc. You may or may not see a performance increase, because you effectively split your workload across two machines.

_THEN_ you can front your web tier with a load balancer, i.e. you load balance to one machine. This gives you: better networking, custom error pages, support for sticky sessions (you likely need them later), and better/more monitoring.

From thereon you can start working on removing those custom scripts of the web tier machine and start splitting this into an _actual_ load-balanced infrastructure, going to two web-tier machines, where traffic is routed using sticky-sessions.

Depending on the application design you can start introducing containers.

Now, this approach will not give you a _cloud-native awesome microservice architecture_ with CI/CD and devops. But it will be enough to have higher availability and more robust handling of the (predictable) load in the near future. And on the way, you will remove bad patterns that eventually allow you to go to a better approach.

I would be interested in hearing if more people face this challenge. I don't know if guides exist around this on the webs.

smeagull · on Aug 25, 2022

I certainly agree about the cron jobs. We shifted a whole bunch of tooling to an internal PaaS solution. One of the tools (a Kanban board I think) we shifted had started sending alerts for jobs that they had since deleted - upon investigation it was the cron job and database that still existed on the old server, still sending out emails.

keule · on Aug 25, 2022

Thank you for sharing the unforeseen depths of a monolith :D

thebeastie · on Aug 25, 2022

Hey, this was useful, thanks.

throwaway787544 · on Aug 24, 2022

If someone would pay for it I'd write that book. There are lots of different methods for different scenarios. There are some books on it but they're either very dry and technical or have very few examples.

Here's the cliffs notes version for your situation:

1. Build a server. Make an image/snapshot of it.

2. Build a second server from the snapshot.

3. Use rsync to copy files your PHP app writes from one machine ('primary') to another ('secondary').

4. To make a "safe" change, change the secondary server, test it.

5. To "deploy" the change, snapshot the secondary, build a new third server, stop writes on the primary, sync over the files to the third server one last time, point the primary hostname at the third server IP, test this new primary server, destroy the old primary server.

6. If you ever need to "roll back" a change, you can do that while there's still three servers up (blue/green), or deploy a new server with the last working snapshot.

7. Set up PagerDuty to wake you up if the primary dies. When it does, change the hostname of the first box to point to the IP of the second box.

That's just one way that is very simple. It is a redundant active/passive distributed system with redundant storage and immutable blue/green deployments. It can be considered high-availability although that term is somewhat loaded; ideally you'd make as much of the system HA as possible, such as independent network connections to the backbone, independent power drops, UPC, etc (both for bare-metal and VMs).

You can get much more complicated but that's good enough for what they want (redundancy) and it buys you a lot of other benefits.

jiggawatts · on Aug 25, 2022

I would not do this for a web app.

Having said that, I have done something very similar for large pools of terminal services session hosts. (Think of a Windows box with a special license that allows multiple remote connected desktop users, and 100 pre-installed GUI applications.)

For web apps, you almost always want either of the following:

- A central file share or NFS mount of some sort, with the servers mounting it directly. Ideally with a local cache that can tolerate file server outages and continue in read-only mode. These days I use zone-redundant Azure File Shares for that. They're fully managed and scale to crazy levels. On a small scale they're so cheap that they're practically free, but have the same high availability as a cluster of file servers in multiple data centres! This is a good approach if your web app writes files locally in normal operation. If you need to distribute an app like this without rewriting that aspect, a central file share is the easy way.

- An automated deployment from something like Azure DevOps pipelines or GitHub Actions that builds VMs one at a time. Both are free in most small-scale scenarios. (For PHP, deployment is just a file copy, so a bash script triggered from a management box is sufficient!) The problem with the "sync stuff around" approach is that corruption gets copied around too. Small one-time mistakes become "sticky" and never undo themselves. Junk files accumulate, eventually causing problems. This method solves that.

Additionally, in all modern clouds you can run "plain" virtual machines in scale sets, where the instances can be scaled out. The scaling part is actually not so important! The key bit is that this will force you to fully automate the VM deployment process, including base OS image updates. Rolling upgrades become easy. Similarly, you can undo the damage done by a malware attack by simply scaling to zero, and then scaling back up. This approach is totally stateless, so you don't need to worry about backing up the VMs. Just rebuild on demand.

But all of that is just a lot of manual labour. It's much easier to host simple apps on a managed platform like Azure App Service, which takes care of all of this. The low-end tiers are cheaper than a pair of VMs.

mindcrime · on Aug 25, 2022

Is anyone teaching "Practice of boring Distributed Systems 101 for dummies on a budget with a tight schedule" ?

I have to admit, there's something about this comment that makes me sad in a way. Not to say that there's anything inherently wrong with this question or to say that I disagree with you exactly. It's just that I like the idea of computing / hacking being centered more around a mindset of limitless possibilities, exploration, questioning the boundaries of what can be done (as opposed to what should be done?), and not something that's caught up in drudgery like budgets, schedules, and "business stuff."

Sorry, guess I'm just feeling nostalgic for a minute or something (maybe because I've been watching that 5 hour long interview Lex Fridman did with Carmack) and am flashing back to what computing was to me when I first got involved. Back in those days, a paper / book like this would have evoked a "WOW, HOW F%#@NG COOL IS THIS!??!!????" reaction from me. And I guess it still kinda does in a weird sort of way, even though I also have to deal with budgets, schedules, and the drudgery of the business world. sigh

suprgeek · on Aug 25, 2022

The AWS scaling series https://aws.amazon.com/blogs/startups/scaling-on-aws-part-1-... gives you a very nice primer on this exact situation. You may or may not opt to use AWS - the fundamental concepts would translate well over other cloud providers or even on premises.

Then you can follow along parts 2, 3 &4 to scale up by factors of ~10 or more -

https://aws.amazon.com/blogs/startups/scaling-on-aws-part-2-... https://aws.amazon.com/blogs/startups/scaling-on-aws-part-3-...

fredsmith219 · on Aug 24, 2022

I can’t believe at 12 people would actually be stressing the system. Could you meet the requirements of the project by setting up the second machine as a hot back up at an offsite location?

phtrivier · on Aug 24, 2022

Maybe. How do I find the O'Reilly book that explains that ? And the petty details about knowing the first one is down and starting the backup ? And just enough data replication to actually have some data in the second machine ? Etc, etc...

My pet peeves with distributed and ops books is that they usually start by laying out all those problems, but then move on to either :

- explain how Big Tech has even bigger problems, before explainig how you can fix Big Tech problems with Big Tech budgets and headcound by deploying just one more layer of distributed cache or queue that vietually ensures your app is never going to work again (That's "Desifning Data Intensive Applications", in bad faith.)

- or, not really explain anything, wave their hands chanting "trade offs trade offs" and start telling kids stories about Byzantine Generals.

lmwnshn · on Aug 24, 2022

More entertainment than how-to guide, and oriented more towards developers than ops, but if you haven't read "Scalability! But at what COST?" [0], I think you'll enjoy it.

[0] https://www.frankmcsherry.org/graph/scalability/cost/2015/01...

arinlen · on Aug 24, 2022

> explain how Big Tech has even bigger problems, before explainig how you can fix Big Tech problems with Big Tech budgets and headcound (...)

What do you have to say about the fact that the career goals of those interested in this sort of topic is... Be counted as part of the headcount of these Big Tech companies while getting paid Big Tech budget salaries?

Because if you count yourself among those interested in the topic, that's precisely the type of stuff you're eager to learn so that you're in a better position to address those problems.

What's your answer to that? Continue writing "hello world" services with Spring Initializr because that's all you need?

phtrivier · on Aug 24, 2022

> Because if you count yourself among those interested in the topic, that's precisely the type of stuff you're eager to learn so that you're in a better position to address those problems.

People will work on problems of different scales in a career ; will you agree that different scales of problems call for different techniques ?

I have no problem with FANGs documenting how to fix FANGs issues !

I'm a little bit concerned about FANGs-devs-wanabee applying the same techniques to non-FANGs issues, though, for lack of training resources about the "not trivial but a bit boring" techniques.

Your insight about the budget / salaries makes sense, though : a book about "building your first boring IT project right" is definitely not going to be a best seller anytime soon :D !

enumjorge · on Aug 24, 2022

Nothing wrong with having those aspirations, but sounds like the parent commenter has non-Big-Tech-sized problems he needs to solve now.

EddySchauHai · on Aug 24, 2022

What you’re describing there sounds like general Linux sysadmin to me?

phtrivier · on Aug 24, 2022

Not entirely, I would argue, if you look at it from the application développer.

You have to adapt parts of your app to handle the fact that two machines might be handling the service (either at the same time, or in succession.)

This has impact on how you use memory, how you persist stuff, etc...

None of which is rocket science, probably - but even things that look "obvious" to lots of people get their O'Reilly books, so...

But you're right that a part of the "distribution" of a system is in the hands of ops more than devs.

EddySchauHai · on Aug 24, 2022

I guess it's just experience to be honest. It happens rarely, you might be lucky enough to be involved with solving it, and then you focus on the important parts of the project again. I've only worked in startups so don't know about the 'Big Tech' solutions but a little knowledge of general linux sysadmin, containers, and queues has yet to block me :) Once the company is big enough to need some complexity beyond that I assume there's enough money to hire someone to come in and put everything into CNCFs 1000 layer tech stack.

Edit: Thinking on this, if I want to scale something it'd be specific to the problem I'm having so some sort of debugging process like https://netflixtechblog.com/linux-performance-analysis-in-60... to find the root cause would be generic advice. Then you can decide to scale vertically/horizontally/refactor to solve the problem and move on.

edgyquant · on Aug 25, 2022

I’m a bit crazy but taking an old monolith and slowly pruning and refining it into a codebase that would make a mathematician blush is soothing to me.

Like a bonsai tree. There’s a point where you’ve written enough helpers (complete with tests) and abstracted away logic from the views when you suddenly are able to rapidly refactor all of the crap that’s left and when you’re done the resulting codebase can be easily distributed or scaled.

So I’d start by just breaking the data away from logic and then break that data away from the database with the idea being to use a redis server as your apps data model which you can call some function to sync to the database from time to time.

Then build an event logger that encompasses everything (every interaction at least) that happens on the front end (this is trivial with JavaScript on events.)

then spin up two nodes of it and write some function that merges two of these event trees (sorting by timestamp + pick a bias for when two events happen at the same time.)

It won’t scale to 1000 users, and you’ll find kinks to work out along the way. But this is a good start

slt2021 · on Aug 24, 2022

distributed systems are usually for millions of users, not 12 users.

for your problem you can start by configuring nginx to work as load balancer and spin up 2nd VM with php app

phtrivier · on Aug 24, 2022

"But what if _the_ machine goes down ? What if it goes down _during quarter earnings legally requested reporting consolidation period_ ? We need _redundancy_ !!"

Also, philosophically, I guess, a "distributed" systems starts at "two machines". (And you actually get most of the "fun" of distributed systems with "two processes on the same machine".)

We're taught how to deal with "N=1" in school, and "N=all fans of Taylor Swift in the same seconds" in FAANGS.

Yet I suspect most people will be working on "N=12, 5 hours a day during office hours, except twice a year." And I'm not sure what's the reference techniques for that.

arinlen · on Aug 24, 2022

> Also, philosophically, I guess, a "distributed" systems starts at "two machines".

People opening a page in a browser that sends requests to a server is already a distributed system.

A monolith sending requests to a database instance is already a distributed system.

Having a metrics sidecar running along your monolith is already a distributed system.

phtrivier · on Aug 24, 2022

> A monolith sending requests to a database instance is already a distributed system.

True, of course.

And even a simple set like this brings in "distribution" issues for the app developper:

When do you connect ? When do you reconnect ?

Where do you get your connection credentials from ?

What should happen when those credentials have to change ?

Do you ever decide to connect to a backup db ?

Do you ever switch your application logic to a mode where you know the DB is down, but you still try to work without it anyway ?

Etc..

Those examples are specific to DBS, but in a distributed system any other services brings in the same questions.

With experience you get opinions and intuitions about how to attack each issues ; my question is still : "should you need to point a newcomer to some reference / book about those questions, where would you point to ?"

User23 · on Aug 25, 2022

> Yet I suspect most people will be working on "N=12, 5 hours a day during office hours, except twice a year." And I'm not sure what's the reference techniques for that.

Here you go: http://bofh.bjash.com/

random_coder · on Aug 24, 2022

It's a joke.

HWR_14 · on Aug 25, 2022

> we'll probably get just enough budget to buy a coffee to an AWS salesman.

You have it backwards. Salesmen will usually buy you the coffee. Even if you don't have the budget today, they still have an expense account and will usually buy you coffee.

salawat · on Aug 24, 2022

Easiest starting point is modeliing the problem between you and your co-workers paying painstaking attention to the flow of knowledge.

Seriously. Most of the difficulty of distributed systems is because you're actually having to manage the flow of information between distinct members of a networked composite. Every time someone is out of the loop, what do you do?

Can you tell if someone is out of the loop? What happens if your detector breaks?

Try it with your coworkers. You have to be super serious on running down the "but how did you know" parts.

Once you have a handle of the way you trip, go hit the books, and learn all the names to the SNAFUs you just acted out.

srkiranraj · on Aug 25, 2022

An introduction to scaling systems - https://youtu.be/a2rcgzludDU

qntty · on Aug 24, 2022

Sounds like you could be looking for something like VMware vSphere if primary-backup replication is what you want

dinosaurdynasty · on Aug 25, 2022

For a couple dozen thousand USD in licensing (vSAN licenses are expensive)

arinlen · on Aug 24, 2022

> As in, "we have a PHP monolith used by all of 12 people in the accounting department, and for some reason we've been tasked with making it run on multiple machines ("for redundancy" or something) by next month.

I find this comment highly ignorant. The need to deploy a distributed system is not always tied to performance or scalability or reliability.

Sometimes all it takes is having to reuse a system developed by a third party, or consume an API.

Do you believe you'll always have the luxury of having a single process working on a single machine that does zero communication over a network?

Hell, even a SPA calling your backend is a distributed system. Is this not a terribly common usecase?

Enough about these ignorant comments. They add nothing to the discussion and are completely detach from reality

phtrivier · on Aug 24, 2022

I failed to make the requester sound more obnoxious than the request.

My point is precisely that transitioning from a single app on a machine is a natural and necessary part of a system's life, but that I can't find satisfying resources on how to handle thise phase, as opposed to handling much higher load.

Sorry for the missed joke.

cwillu · on Aug 25, 2022

Step 1: Make sure your backups are good enough to boot the second machine from, and test that that works once a month.

Step 2: There is no step 2.

polskibus · on Aug 25, 2022

in my opinion, then first step should be thorough profiling and measurement under real load to decide which layer needs to be scaled out first. That should comprise a baseline for future comparisons, so you know whether the app is actually doing better than before and to prioritize your efforts.

qazpot · on Aug 25, 2022

Designing Data-Intensive Applications is the best starting point.

phtrivier · on Aug 25, 2022

I'm in the middle of it, and it's clearly a great piece of work, don't get me wrong ; however, I'm precisely wondering if there is a "Designing Data Non-Intensive Applications" book out there :)

rajeshp1986 · on Aug 24, 2022

First step would be to move database to a separate server(3rd server) and use the 2 servers to run your application. This way you have some resiliency when one of the server goes down.

dang · on Aug 25, 2022

infogulch · on Aug 24, 2022

15 pages of just TOC. 400+ pages of content

> These are notes for the Fall 2022 semester version of the Yale course CPSC 465/565 Theory of Distributed Systems

There are a lot of algorithms, but I don't see CRDTs mentioned by name. Perhaps it's most closely related to "19.3 Faster snapshots using lattice agreement"?

dragontamer · on Aug 24, 2022

> CRDTs

Wrong level of abstraction. This is clearly a lower level course than that and discusses more fundamental ideas.

A quickie look through chapter 6 reminds me of CRDTs, at least the vector clock concept. Other bits from other parts of this course probably need to be combined into what would be called a CRDT.

yewenjie · on Aug 25, 2022

Is there an overview of distributed systems that can be finished in one evening? Preferably in video format.

xuancanh · on Aug 25, 2022

Martin Kleppmann's lecture series is the most concise one https://www.youtube.com/playlist?list=PLeKd45zvjcDFUEv_ohr_H.... The total length is about 8 hours. You can finish it in one evening, depending on whether you sleep or not.

tychota · on Aug 24, 2022

Why teach Paxos and not raft. I thought raft was easier to grasp, and is used a lot nowadays?.

dinosaurdynasty · on Aug 24, 2022

Raft is an opinionated version of Paxos, they aren't fundamentally super different

quibono · on Aug 27, 2022

Would you say one is used more than the other in the industry or is it more of a fair split?