Moving Past the Scaling Myth

ktRolster · on May 1, 2016

Fred Brooks pointed out that with a small team of competent programmers, any organizational methodology will work.

If you pay attention, the best Agile systems focus on improving the skills of developers instead of forcing people to follow the 'steps' or a formula.

bsdpython · on May 1, 2016

People over process every time. It's really hard to build a team of good people though so instead companies try to tweak their process. Process can matter e.g. a good team can be wrecked by a bad process but a bad team is bad no matter the process.

muratk · on May 6, 2016

Depends on what you define by “bad” team. I'm in the Philippines, and there's lots of BPO companies here (call centers, dev shops, …). They have to deal with a high rate of ppl leaving, and train their employees from scratch. Still, those companies are successful and ppl want those relatively well paid jobs, even if they'll jump ship for any reason at all. In this kind of environment, process is everything.

exabrial · on May 1, 2016

I really wish the scaling myth would die. Everyone pretends they have a scalability problem because it's sexy and as such, travesties have been committed (you know, in case we scale). How about instead designing software where the architecture and code is clear, consistent, and readable by the next guy?

gaius · on May 1, 2016

My philosophy: plan to scale 10x. If you ever approach it - which in most cases is unlikely - you can justify completely rearchitecting, but only to scale another 10x. If not, well your design is probably not too top-heavy anyway. It's people who want a single design to take them from 1x to 1000000x who get into trouble.

hobofan · on May 1, 2016

I disagree with that statement.

Of course a design that works at 1x doesn't always work a 10000x, but why shouldn't a design that works a 10000x not also work at 1x?

At least for technical scalability the answer almost always lies in overhead. Yes, while the system scales perfectly linearly at O(n), the hidden constant is often neglected, which means at small values of n the solution becomes unfeasible. So while Cassandra might for one of my use cases be the perfect candidate due to its feature set and linear scalability it becomes unfeasible at a small scale because I have to run at least 3 nodes with high overhead (because Java, sigh) + an additional 3 nodes with Zookeeper with high overhead (because Exhibitor + Curator, all in Java).

/rant, I don't like bashing on Java but overhead is definitely one of its weak spots, and here is where it comes into play.

tango12 · on May 1, 2016

I understand your point. Out of curiosity, can you take examples of software that work at 10000x and also at 1x equally well?

ssmoot · on May 1, 2016

That's not really a scaling problem though. It's an availability one. Cassandra just doesn't compromise on it.

You have alternatives, and it doesn't really have to mean a whole lot of extra work until you get to "crazy" levels. No one wants a StrategyFactoryFactoryVisitorImpl, but on the other end of the spectrum you can get a ton of mileage out of a few Facades in the right place.

Scalability is an optimization topic. But just like the premature saying everyone knows, the follow up that is less acknowledged is maybe the more important part (IMO).

gaius · on May 1, 2016

Scaling down is always easier than scaling up, because you are going back over known territory. Scaling up is hard because it takes you into unknowns. Rumsfeld's Theory applies.

hobofan · on May 1, 2016

Yes, it is easier but is also very rarely done, since if you have scaled another 10x you don't care about the previous stage of scaling and have little incentive to keep the constant factor at that stage low.

I can understand why such systems are built the way they are, but I still find it sad, since it breaks the expectations of open source software, and creates a chasm between startups at the beginning and end of a big scaling phase. It more or less forces you as an early stage startup to use an inferior design which you know won't scale up beyond a certain point, even though know about a superior design that theoretically scales down to your current requirements.

gaius · on May 1, 2016

If you design a system to support 10x your current workload, you are likely to get a well engineered solution that delivers the performance you need at a price you can afford. If and when you reach 10x, you will have a good body of knowledge to help you do the next 10x. That takes you to 100x. Systems that grow that much are exceedingly rare. The odds of your startup being the next Facebook are incredibly low. But this strategy will get you there, one step at a time.

If you sit down and try to design a system to support 100000x your current workload, you are likely to end up with an over-engineered and very expensive mess, then by the time you reach 10x you realize you should have done it differently because in practice the bottlenecks aren't where you thought they'd be. This strategy will bankrupt you before you get to even 10x.

This is how you tell who has really scaled systems, from those who only wish they had that problem.

wpietri · on May 1, 2016

This makes a lot of sense to me. At the very least, I think that most American organizations undergo a substantial cultural transition as they scale, so it makes sense we'd also need a process transition.

The problem I see is that people aren't really willing to be honest about the cultural transition that happens, so we also can't be honest about the process transition.

I think Agile-ish approaches work very well in startups, because the structure is pretty flat, and the goals are shared. But as companies grow, they tend to become what I think of business feudalism: hierarchical, control-oriented, territorial. For that, it makes sense you need different processes. And I think large company Agile is in effect Waterfall with a faster cadence, so you get that different process. But nobody will admit it. "We're doing Agile," they say, with too-bright eyes and gritted teeth.

What I wonder is: what if instead of killing the peer culture and the human-centered process as we scaled, we kept them?

michaelfeathers · on May 1, 2016

At scale many organizations create substructure that contain strong peer culture and human-centered process, so it isn't lost, it's localized.

I think that the big challenge is to recognize that hierarchical structure isn't good or bad, it's something that most natural systems do in response to scaling. If we understand it as a design parameter we can make it humane.

wpietri · on May 1, 2016

The first part makes sense to me, although I don't think I've ever personally seen it happen. What percentage of companies would you say operate in the fashion you describe?

When you talk about hierarchy as natural, I'm not sure we're talking about the same thing. I mean hierarchy as a system of power/control relationships, with each unit having an actor in charge and totally containing sub-units. E.g., the classic army chain of command. I can't think of anything natural that has that. Flocks and troops have leaders, but I think humanity is unique in its extension of social dominance to many-layered nested structures controlling millions of individuals. So I'd call it essentially artificial.

I agree that we can apply a hierarchical analytical model to many natural systems, but I think there are plenty of other models we could apply, just ones that aren't as easy for current human brains. E.g., we could talk about fractal, self-similar, or recursive processes; those have no connotation of primate dominance. Or note the shift away from "great chain of being" analyses of nature toward ecological and non-teleological analytical models. When all you have is social primates, everything looks like a hierarchy.

I would certainly like to believe that we can make hierarchies humane. But given that a fundamental characteristic is individual power over others that increases as you go up the chain, and given human fallibility, cognitive limits, and human nature, it seems to me they have to become essentially inhumane at some size. And in practice I certainly see a correlation between hierarchy size and inhumanity.

Correlation isn't proof, of possibility, of course. Just natural tendency. Are there large hierarchies you think violate that correlation, where they have gotten more humane with scale?

michaelfeathers · on May 1, 2016

> The first part makes sense to me, although I don't think I've ever personally seen it happen. What percentage of companies would you say operate in the fashion you describe?

Ever worked in a chain restaurant? Camaraderie among coworkers at a site under a manager (so hierarchy) and not as much peer connection across sites.

I'd argue that you are using a narrow definition of hierarchy that is centered on ideas about power rather than dynamics in emergent systems. The arterial structure of a leaf is hierarchical. So are airport connection patterns. Whenever you have preferential attachment or costs associated with material or information transfer there's a force toward hierarchy, designed or not. Systemic effect. No primates needed.

The interesting bit to me is that if we considered something non-sentient, like a network router that controls flow over a set of nodes, it would be easy to anthropomorphize it purely based on its behavior and see it as exerting dominance when it makes its routing decisions if they don't align with what we want at that moment. Although there can be abusive hierarchy, there's something deep to be considered in that scenario. We might be inclined to see all hierarchy as bad because that sort of conflict of interest can happen, willed or not. That's why, where there are power relationships in employment or even representative democracy, systemic integrity and human values depend upon gaining and maintaining consent.

wpietri · on May 1, 2016

Thanks for the reply, Michael!

I have worked in a chain restaurant. And that's one of the experiences that convinces me of the essential inhumanity of deep hierarchy. There was some coworker cameraderie, sure. But I don't think there was anything human-centered about the place. Workers were replaceable, disposable units. Much of the worker bonding was despite (or against) the company and the managers, not because of it. And restaurants in some ways have it easier, in that the purpose of the work is present and visible. For many organizations, that's much less true.

As to definition of hierarchy, the last half of the word literally means "rule" or "government", like monarchy or oligarchy, so I'm comfortable with my definition. I agree one can use it as a metaphor to understand natural systems, but it's only a metaphor. And as with the Great Chain of Being, I think it's a dangerously easy one to over-apply.

There are surely phenomena in emergent systems that relate and are worth studying, but conflating them with primate dominance structures obscures more for me than it reveals. To the extent that nature tells us things, though, it can't tell us what ought to exist. That's our job.

I totally agree that consent is vital for humane organizations. But I think that's in direct opposition to hierarchy, which is about top-down control. American democracy, for example, is basically non-hierarchical. I'm a participant in and subject to at least 9 quasi-independent governments (city, county, state, federal, regional transit district, school district, community college district, air quality management district, municipal utility district), plus various splits (executive, legislative, judicial). None of these controls any of the others. And none of them controls me, either, except in certain deeply constrained circumstances.

That stands in contrast to me to the essentially feudal structures of scaling American businesses. The main rule there is "my way or the highway" for all superiors. In theory, employees can always switch jobs to another company. But that's always much easier on the employer than the employee, so I think that does very little to act as a practical check. Employees and teams can try to carve out niches, but I've always seen that happen at the sufferance of (or with the neglect of) management, never because the teams had much real authority. And the more levels they have above them, the larger the set of people who can ruin things for them without consequence or even awareness.

The only exception I personally see much is at very mission-driven organizations. There the stated purpose really seems to have some teeth; people are willing to leave if they aren't doing good. In fact, right now, one team at Code for America has just demanded the resources to improve their code quality so that they can be more effective at serving their audience. And I just saw them excitedly share your 2002 paper. So I believe it's possible!

I guess what I'm arguing for here is to see what happens when we put human-centered systems first and scaling second. It may be that there are fundamental limits, in which case we'll face some hard choices between economy of scale and treating humans like humans. But I've never even seen a place try. It's always, "Oh, we're getting big, better hire an earl of mobile development and put him under the duke of engineering." Maybe the bankers overthrow the CEO boy-king, maybe they don't. But I've never seen anybody stop to question the dominant paradigm.

michaelfeathers · on May 1, 2016

Bill, it's interesting to read your reply and see your worldview. It looks like you've had way worse experiences with employers than with governmental entities, and you see living with the former way worse than living with the latter. I know people who see the world in exactly the opposite way. They'll rail all day about how the government affects their lives and constrains their choices (and they have very concrete examples) yet they are happy working within constraints at work and see more opportunity there than they do in their governmental dealings.

Realistically, they have options in both spheres, but they only feel one oppressively and they tend to assume that everyone else does too. Where power is exercised at work they are okay with it - they recognize it as part of the system and don't feel like cogs.

I wish you the best in finding new glasses to see the world of work through, just as I wish the same for people I know who are fighting governmental oversteps. The key thing is - it's not all bad. There is badness out there but structure is not determinative. Look for happy people in any system. You'll find them. What we look for in life does affect what we see, and what we create. Best of luck to you.

skybrian · on May 1, 2016

I don't think it's that easy. Humans only have a limited ability to handle variation. Treating everyone as a special case (as you do for people you actually know) breaks down as you scale up.

When administering computers we talk about treating them as "cattle, not pets" to increase our ability to manage large systems. Using the same metaphor with people would be offensive but the policy-making impulse is similar.

To some extent it even arises for valuing fairness. We make rules to try to treat people consistently and fairly, forgetting that people are a mess of special cases.

Small companies value consistency too (using the same tools and procedures) but it's just easier and less likely to backfire with fewer people.

wpietri · on May 1, 2016

Oh, I don't think it's easy. Indeed, I wonder if it's possible.

In which case, I'd like to see what happened if we favored humanity over scale.

nickpsecurity · on May 1, 2016

Hmmm. Maybe. What I've found is that IT often re-invents past designs and methods to solve modern problems. Matter of fact, the way the cloud providers scaled was to re-implement mainframes on x86 boxes. The next level is applying lessons from older research in modern ways to improve efficiency of various parts of HW and SW. If anything, quite a few people came up with timeless lessons that teach you how to do efficient, reliable computation. They keep getting reused.

So, I don't think we can take it as far as author is suggesting where it's like classical vs quantum physics. Maybe at ASIC vs software level. Even those were partly joined with synthesis & coprocessing tools. I just don't see it as every high-level description I read about things is based on similar principles at each layer of the stack. Given similar constraints or goals, you would use similar strategy. There's certainly divergence or outliers but more repetition of patterns than anything.

The only myths are that technology/fads X, Y, and Z should've been widespread adopted over the ones (or enhancements of them) that consistently worked. The author is seeing results of people building on stuff that came with assumptions that don't match new problem. Or people straight-up ignoring root cause in their solutions or methods. Common problems.

tango12 · on May 1, 2016

Reg. the architecture question: People generally start carving out microservices once they hit some scale.

What about starting off with microservices (because tools like k8s make the otherwise insane management overhead more tractable)? Could that be a possible solution?

The problem of scaling state, databases need to handle anyway. Ideally the same database just keeps on working as you go up. Or atleast the protocol is the same, and you switch out the implementation from a single instance to a clustered version.

cateye · on May 1, 2016

One of the underlying problems is that there isn't a formula (= process) for innovation. Most software development tries to create a competitive advantage by creating something unique that isn't easy to replicate.

So, every process around being able to deliver that by just applying that process creates a false hope. Or it is not meant for this purpose but the "buyers" aren't aware of it and have other expectations.

DanielBMarkham · on May 1, 2016

As somebody who loves startups and small teams, and who has a job working with both small and big organizations, I have been living this scaling thing for a long time.

I'd like to give you a simplistic answer, like "All you need, kid, is a small team! For anything!"

Slogans like that are true, but yet they are terribly misleading, because 1) many organizations are already terribly overstaffed, and 2) it doesn't really help to tell teams "What do I do right now?"

So here's as simple as I can make it:

Good organizations will do whatever is necessary to make things that people want, even if that means instead of programming, the programmers sit on the phone and do some manual chore for people as they call. Before you code anything, you have to be hip-to-hip with some real person that you're providing value to.

But as soon as you have those five folks sitting on the phones doing something useful? You gotta immediately automate. Everything. This means you're going to have all freaking kinds of problems as you move from helping ten people a day to helping a million. You have to automate solutions, access, coordination, resource allocation, failovers, and so on -- the list is almost endless (but not quite)

As they grow, poor organizations take a scaling problem and assign it to a person. Somebody does it once, then they're stuck with it for life. Good organizations continue to do it manually, then immediately automate. Somebody does it once, then the team immediately starts "plugging all the holes" and fixing the edge cases so that nobody ever has to manually be responsible for that again.

Growing "sloppy" means you end up helping those million people a day -- but you have hundreds of people on staff. Meetings take time. Coordination takes time. There are a ton of ways to screw up. People tend to blame one another. Growing good means you can be like WhatsApp, servicing a billion users with a tiny team of folks.

If you're already an existing BigCorp and have been around for a while -- odds are that you are living with this sloppy growth pattern. That means you need to start, right now, with identifying all the feedback loops, like release management, and automating them, such as putting in a CI/CD pipeline. Not only that, but there are scores of things just like that. You have a lot of work to do. It might be easier just to start over. In fact, in most cases the wisest thing to do is start over.

Now picture this: you're an Agile team at BigCorp and you've got the fever. Woohoo! Let's release often, make things people want, and help make the world a better place. But looking around, all you see is ten thousand other developers in a huge, creaky machine that's leaking like a sieve. You go to a conference with a thousand other BigCorps, just like yours. Are you going to want to hear about how it's better just to trash things and start over, about the 40 things you need to have automated right now but don't, or how to make your section of 150 programmers work together; how to "scale agile"?

Scaling Agile is an issue because the market says it is an issue.