Hacker News new | past | comments | ask | show | jobs | submit login

Mantras like "methods should be shorter than 15 lines of code" or "classes should be small" turned out to be somewhat wrong.

So much this.

The whole point of functions and classes was to make code reusable. If the entire contents of a 100 line method are only ever used in that method and it's not recursive or using continuations or anything else weird, why the hell would it be "easier to read" if I had to jump up and down the file to 7 different submethods when the function's entire flow is always sequential?




> The whole point of functions and classes was to make code reusable.

I’m amazed that here we are >40 years on from C++, and still this argument is made. Classes never encapsulated a module of reusability, except in toy or academic examples. To try and use them in this way either leads to gigantic “god” classes, or so many tiny classes with scaffolding classes between them that the “communication overhead” dwarfs the actual business logic.

Code base after code base proves this again and again. I have never seen a “class” be useful as a component of re-use. So what is? Libraries. A public interface/api wrapping a “I don’t care what you did inside”. Bunch of classes, one class, methods? So long as the interface is small and well defined, who cares how it’s structured inside.

Modular programming can be done in any paradigm, just think about the api and the internal as separate things. Build some tests at the interface layer, and you’ve got documentation for free too! Re-use happens at the dll or cluster of dll boundaries. Software has a physical aspect to it as well as code.


This is not my experience. Multiple inheritance within a code base of certain sub-functionalities and states is a perfectly good example of reuse. You do not need to go all the way out to the library level. In fact, it is the abstract bases that really minimize the reusable parts that I find most useful.

I'm not saying you have to use classes to do this, but they certainly get the job done.


We are talking about different things. If you want to use inheritance inside your module, behind a reasonable API, in order to re-use common logic, I won’t bat an eye. I won’t know, I’m working with the public part of your module.

If you structure your code so that people in my team can inherit from your base class (because you didn’t make an interface and left everything public), and later you change some of this common logic, then I will curse your name and the manner of your conception.


Since learning functional programming well. I feel a need to use inheritance in C++ maybe a handful of places.

The problem with inherentice reuse is if you need to do something slightly different you are out of luck. Alternatively with functions you call what you need. And can break apart functionality without changing the other reuses.


I know that a lot of people advocate for composition over inheritance. Inheritance can add a lot of complexity especially if it is deep or involves a lot of overrides. It can be difficult to find out where a method came from inside the inheritance chain or if it has been overridden and consequently how it will behave.

Composition at least makes things a little more obvious where methods are getting their functionality. It also has other benefits in terms of making objects easier to mock.


Surely this is use case dependent? I’ve worked on projects where modular programming works well and others where not so much.


Specifically here I am talking about the concept of “re-use”. That is, the ability to write a bunch of code that does a “thing” and use that more than once, without significant modification.

Modularity is a much bigger concept, related to the engineering of large software systems. These days, “micro-services” is one way that people achieve modularity, but in the old days it was needed for many of the same reasons, but inside the monolith. The overall solution is composed of blocks living at different layers.

Re-use also exists inside modules, of course, by using functions or composition or — shudder — inheritance of code.

Modular programming has value as soon as more than one team needs to work on something. As it’s impossible to predict the future, my opinion is that it always has value to structure a code-base in this way.


>why the hell would it be "easier to read" if I had to jump up and down the file to 7 different submethods when the function's entire flow is always sequential?

Because you don't jump up and down the file to read it.

Each method that you create has a name, and the name is an opportunity to explain the process - naturally, in-line, without comments.

I write code like this all the time - e.g. from my current project: https://github.com/zahlman/bbbb/blob/master/src/bbbb.py . If I wanted to follow the flow of execution, I would be hammering the % key in Vim. But I don't do that, because I don't need or want to. The flow of the function is already there in the function. It calls out to other functions that encapsulate details that would be a distraction if I want to understand the function. The functions have names that explain their purpose. I put effort into names, and I trust myself and my names. I only look at the code I'm currently interested in. To look at other parts of the code, I would first need a reason to be interested in it.

When you look at yourself in the mirror, and notice your hand, do you feel compelled to examine your hand in detail before you can consider anything about the rest of your reflection? Would you prefer to conceive of that image as a grid of countless points of light? Or do you not find it useful that your mind's eye automatically folds what it sees into abstractions like "hand"?

35 years into my journey as a programmer, the idea of a 100-line function frightens me (although I have had to face this fear countless times when dealing with others' code). For me, that's half of a reasonable length (though certainly not a hard limit) for the entire file.


This is how I work as well, and the reason I tend to write many small functions rather than few large ones is precisely because it reduces cognitive load. You don't have to understand what the canSubmit function does, unless you are interested in knowing what the conditions to submit this form are.

Ironically, the author of the post claims it has the opposite effect.


    # Can't import at the start, because of the need to bootstrap the
    # environment via `get_requires_for_build_*`.
  
This comment is a great example of what information you lose when you split linear code into small interrelated methods. You lose ordering and dependencies.

Sometimes it's worth it. Sometimes it isn't. In my opinion it's almost never worth it to get to the Uncle Bob's approved length of methods.

10-30 lines is OK. 3 is counterproductive except for a small subset of wrappers, getters etc. Occasionally it's good to leave a method that is 300 lines long.

If your code always does 9 things in that exact order - it's counterproductive to split them artificially into 3 sets of 3 things to meet an arbitrary limit.


>This comment is a great example of what information you lose when you split linear code into small interrelated methods.

Inlining `_read_toml` or `_read_config` would change nothing about the reasoning. The purpose was to make sure the import isn't tried until the library providing it is installed in the environment. This has nothing to do with the call graph within my code. It's not caused by "splitting the code into interrelated methods" and is not a consequence of the dependencies of those functions on each other. It's a consequence of the greater context in which the entire module runs.

The way that the system (which is not under my control) works (I don't have a really good top-down reference handy for this - I may have to write one), a "build frontend" will invoke my code - as a subprocess - multiple times, possibly looking for and calling different hooks each time. The public `get_requires_for_build_wheel` and `get_requires_for_build_sdist` are optional hooks in that specification (https://peps.python.org/pep-0517/#optional-hooks).

However, this approach is left behind from an earlier iteration - I don't need to use these hooks to ask the build frontend to install `tomli`, because the necessary conditions can be (and currently are) provided declaratively in `pyproject.toml` (and thus `tomli` will be installed, if necessary, before any attempts to run my backend code). I'll rework this when I get back to it (I should just be able to do the import normally now, but of course this requires testing).


To paraphrase a recentish comment from jerf, “sometimes you just have a long list of tasks to do”. That stuck with me. Now I’m a bit quicker to realize when I’m in that situation and don’t bother trying to find a natural place to break up the function.


For me it depends. Sometimes I find value in making a function for a block of work I can give its own name to, because that can make the flow more obvious when looking at what the function does at a high level. But arbitrarily breaking up a function just because is silly and pointless.


Plus, laying the list of tasks out in order sometimes makes it obvious how to split it up eventually. If you try to split it up the first time you write it, you get a bunch of meaningless splits, but if you write a 300 line function, and let it simmer for a few weeks, usually you can spot commonalities later.


That's also true, though in this case I'm not necessarily worried about commonalities, just changing the way it reads to focus on the higher level ideas making up the large function.

But revisiting code after a time, either just because you slept on it or you've written more adjacent code, is almost always worth some time to try and improve the readability of the code (so long as you don't sacrifice performance unnecessarily).


Define that function directly in the place where it is used (e.g. as a lambda, if nesting of function definitions is not allowed). Keeps the locality and makes it obvious that you could just have put a comment instead.


A useful trick is to then at least visually structure those 150 lines with comments that separate some blocks of functionality. Keeps the linear flow but makes it still easier to digest.


Why not just do something like this then? This:

    myfunction(data) {
        # do one thing to the data
        ...

        # now do another
        ...
    }
becomes that:

    myfunction(data) {
        do_one_thing_to_the_data(data)
        now_do_another(data)
    }

    do_one_thing_to_the_data(data) {
        ...
    }

    now_do_another(data) {
        ...
    }
Still linear, easier to get an overview, and you can write more modular tests.


Because now you have to jump around in order to see the sequence of events, which can be very frustrating if you have to constantly switch between two of these functions.

Plus, if we're dealing with a "long list of tasks" that can't be broken up in reusable chunks, it probably means that you need to share some context, which is way easier to do if you're in the same scope.

One thing I find useful is to structure it in blocks instead, so you can share things but also contain what you don't want shared. So e.g. in rust you could do this:

    let shared_computation = do_shared_computation();
    
    let result_one = {
        let result = do_useful_things();
        other_things(&shared_computation);
        result
    }
    
    ...
I think it's a nice middleground. But you still can't write modular tests. But maybe you don't have to, because again, this is just a long list of tasks you need to do that conceptually can't be broken down, so maybe it's better to just test the whole thing as a unit.


Instead of, say, 10 functions in a file that are all individually meaningful, you now have maybe 50 functions that are mostly tiny steps that don't make much sense on their own. Good like finding the "real" 10 functions buried amongst them. It's certainly higher cognitive load in my (painful) experience.


If the arguments to the function required are small, then breaking such a block down makes sense. Otherwise, it usually feels like an unnatural function to me.


We have different ideas about what "linear" means.


It comes down to the quality of the abstractions. If they are well made and well named, you'd rather read this:

  axios.get('https://api.example.com', {
      headers: { 'Authorization': 'Bearer token' },
      params: { key: 'value' }
  })
  .then(response => console.log(response.data))
  .catch(error => console.error(error));
than to read the entire implementations of get(), then() and catch() inlined.


I agree except I think 100 lines is definitely worth a method, whereas 15 lines is obviously not worthy for the most cases and yet we do that a lot.

My principle has always been: “is this part a isolated and intuitive subroutine that I can clearly name and when other people see it they’ll get it at first glance without pausing to think what this does (not to mention reading through the implemention)”. I’m surprised this has not been a common wisdom from many others.


In recent years my general principle has been to introduce an abstraction (in this case split up a function) if it lowers local concepts to ~4 (presumably based on similar principles to the original post). I’ve taken to saying something along the lines of “abstractions motivated by reducing repetition or lines of code are often bad, whilst ones motivated by reducing cognitive load tend to be better”.

Good abstractions often reduce LOC, but I prefer to think of that as a happy byproduct rather than the goal.


>My principle has always been: “is this part a isolated and intuitive subroutine that I can clearly name and when other people see it they’ll get it at first glance without pausing to think what this does (not to mention reading through the implemention)”.

I hold this principle as well.

And I commonly produce one-liner subroutines following it. For me, 15 lines has become disturbingly long.


I tend toward John Carnack's view. He seemed annoyed that he was being pressed to provide a maximum at all and specified 7000 lines. I don't think I have ever gone that high. But really is just a matter of what you are doing. We expect to reuse things way more often than we actually do. If you wrote out everything you need to do in order and then applied the rule of three to make a function out of everything you did three times, it is very possible you wouldn't remove anything. In which case I think it should just be the one function.


> We expect to reuse things way more often than we actually do.

This is about readability (which includes comprehensibility), not reuse. When I read code from others who take my view, I understand. When I read code from those who do not, I do not, until I refactor. I extract a piece that seems coherent, and guess its purpose, and then see what its surroundings look like, with that purpose written in place of the implementation. I repeat, and refine, and rename.

It is the same even if I never press a key in my editor. Understanding code within my mind is the same process, but relying on my memory to store the unwritten names. This is the nature of "cognitive load".


Yeah, I find extracting code into methods very useful for naming things that are 1) a digression from the core logic, and 2) enough code to make the core logic harder to comprehend. It’s basically like, “here’s this thing, you can dig into it if you want, but you don’t have to.” Or, the core logic is the top level summary and the methods it calls out to are sections or footnotes.


I find "is_enabled(x)" to be easier to reason about than

    if (x.foo || x.bar.baz || (x.quux && x.bar.foo))
Even if it's only ever used once. Functions and methods provide abstraction which is useful for more than just removing repetition.


If you're literally using it just once, why not stick it in a local variable instead? You're still getting the advantage of naming the concept that it represents, without eroding code locality.

However, the example is a slightly tricky basis to form an opinion on best practice: you're proposing that the clearly named example function name is_enabled is better than an expression based on symbols with gibberish names. Had those names (x, foo, bar, baz, etc) instead been well chosen meaningful names, then perhaps the inline expression would have been just as clear, especially if the body of the if makes it obvious what's being checked here.

It all sounds great to introduce well named functions in isolated examples, but examples like that are intrinsically so small that the costs of extra indirection are irrelevant. Furthermore, in these hypothetical examples, we're kind of assuming that there _is_ a clearly correct and unique definition for is_enabled, but in reality, many ifs like this have more nuance. The if may well not represent if-enabled, it might be more something like was-enabled-last-app-startup-assuming-authorization-already-checked-unless-io-error. And the danger of leaving out implicit context like that is precisely that it sounds simple, is_enabled, but that simplicity hides corner cases and unchecked assumptions that may be invalidated by later code evolution - especially if the person changing the code is _not_ changing is_enabled and therefore at risk of assuming it really means whether something is enabled regardless of context.

A poor abstraction is worse than no abstraction. We need abstractions, but there's a risk of doing so recklessly. It's possible to abstract too little, especially if that's a sign of just not thinking enough about semantics, but also to abstract too much, especially if that's a sign of thinking superficially, e.g. to reduce syntactic duplication regardless of meaning.


Pretty sure every compiler can manage optimizing out that method call, so do whichever makes you and your code reviewer happy.


A local variable is often worse: Now I suffer both the noise of the unabstracted thing, and an extra assignment. While part of the goal is to give a reasonable logical name to the complex business logic, the other value is to hide the business logic for readers who truly don't care (which is most of them).

The names could be better and more expressive, sure, but they could also be function calls themselves or long and difficult to read names, as an example:

    if (
        x.is_enabled ||
        x.new_is_enabled ||
        (x.in_us_timezone && is_daytime()) ||
        x.experimental_feature_mode_for_testing 
        )...
That's somewhat realistic for cases where the abstraction is covering for business logic. Now if you're lucky you can abstract that away entirely to something like an injected feature or binary flag (but then you're actually doing what I'm suggesting, just with extra ceremony), but sometimes you can't for various reasons, and the same concept applies.

In fact I'd actually strongly disagree with you and say that doing what I'm suggesting is even more important if the example is larger and more complicated. That's not an excuse to not have tests or not maintain your code well, but if your argument is functionally "we cannot write abstractions because I can't trust that functions do what they say they do", that's not a problem with abstractions, that's a problem with the codebase.

I'm arguing that keeping the complexity of any given stanza of code low is important to long-term maintainability, and I think this is true because it invites a bunch of really good questions and naturally pushes back on some increases in complexity: if `is_enabled(x)` is the current state of things, there's a natural question asked, and inherent pushback to changing that to `is_enabled(x, y)`. That's good. Whereas its much easier for natural development of the god-function to result in 17 local variables with complex interrelations that are difficult to parse out and track.

My experience says that identifying, removing, and naming assumptions is vastly easier when any given function is small and tightly scoped and the abstractions you use to do so also naturally discourage other folks who develop on the same codebase from adding unnecessary complexity.

And I'll reiterate: my goal, at least, when dealing with abstraction isn't to focus on duplication, but on clarity. It's worthwhile to introduce an abstraction even for code used once if it improves clarity. It may not be worthwhile to introduce an abstraction for something used many times if those things aren't inherently related. That creates unnecessary coupling that you either undo or hack around later.


> Now I suffer both the noise of the unabstracted thing, and an extra assignment.

Depends on your goals / constraints. From a performance standpoint, the attribute lookups can often dwarf the overhead of an extra assignment.


I'm speaking solely from a developer experience perspective.

We're talking about cases where the expression is only used once, so the assignment is free/can be trivially inlined, and the attribute lookups are also only used once so there is nothing saved by creating a temporary for them.


Wouldn't you jump to is_enabled to see what it does?

That's what I always do in new code, and probably why I dislike functions that are only used once or twice. The overhead of the jump is not worth it. is_enabled could be a comment above the block (up to a point, notif it's too long)


> Wouldn't you jump to is_enabled to see what it does?

That depends on a lot of things. But the answer is (usually) no. I might do it if I think the error is specifically in that section of code. But especially if you want to provide any kind of documentation or history on why that code is the way it is, it's easier to abstract that away into the function.

Furthermore, most of the time code is being read isn't the first time, and I emphatically don't want to reread some visual noise every time I am looking at a larger piece of code.


That makes sense. To mee it's not about the function having bad code, but different opinions about what exactly "enabled" means.

If I'm not interested I just jump past the block when reading (given that it's short and tidy)


> Wouldn't you jump to is_enabled to see what it does?

It determines whether the thing is enabled. Or else some other dev has some 'splainin' to do. I already understand "what it does"; I am not interested in seeing the code until I have a reason to suspect a problem in that code.

If the corresponding logic were inline, I would have to think about it (or maybe read a comment) in order to understand its purpose. The function name tells me the purpose directly, and hides the implementation that doesn't help me understand the bigger picture of the calling function.

Inline code does the opposite.

When the calculation is neatly representable as a single, short, self-evident expression, then yes, I just use a local assignment instead. If I find myself wanting to comment it - if I need to say something about the implementation that the implementation doesn't say directly - using a separate function is beneficial, because a comment in that function then clearly refers to that calculation specifically, and I can consider that separately from the overall process.


> It determines whether the thing is enabled.

Ah, but what exactly does "enabled" mean in this context? Might seem nitpicky, but I might very well have a different opinion than the person who wrote the code. I mean, if it was just `if foo.enabled ..` no one would put it in a new function.. right? :)

I would say a comment does the same, and better because it can be multi line, and you can read it without having to click or move to the function call to see the docs.

And you can jump past the implementation, iff it's short and "tidy" and enough.

Yes, at some point it should be moved out anyway. I'm just weary from reading code with dozens of small functions, having to jump back and forth again and again and again


>Ah, but what exactly does "enabled" mean in this context?

If the code is working, it means what it needs to mean.

> I mean, if it was just `if foo.enabled ..` no one would put it in a new function. right?

Sure. This is missing the point, however.

> I'm just weary from reading code with dozens of small functions, having to jump back and forth again and again and again

Why do you jump to look at the other parts of the code? Did it fail a test?


> If the code is working, it means what it needs to mean.

No. Working code says nothing about the meaning of a label, which is purely to inform humans. The computer throws it away, the code will work no matter what you name it, even if the name is entirely wrong.

> Why do you jump to look at the other parts of the code? Did it fail a test?

Because people pick bad names for methods, and I've been hurt before. I'm not reading the code just to fix a problem, I'm reading the code to understand what it does (what it ACTUALLY does, not what the programmer who wrote it THOUGHT it does), so I can fix the problem properly.


>Because people pick bad names for methods, and I've been hurt before.

So you write long functions because other people are bad at writing short ones?


I have absolutely done this myself in the past and confused myself with bad names. Any criticism I apply to other people also applies to myself: I am not a special case.

Naming things is hard! Even if you're really good at naming things, adding more names and labels and file separation to a system adds to the complexity of the system. A long function may be complex, but it doesn't leak the complexity into the rest of the system. Creating a function and splitting it out is not a zero cost action.

I write long functions when long functions make sense. I write plenty of short functions too, when that makes sense. I'm not religiously attached to function or file size, I'm attached to preserving the overall system structure and avoiding stuff that makes easy bugs.


So my claim is that you do this less often than you claim to. There is some cutoff where you trust the code enough to not investigate it further. I'm of the opinion that this trust should generally be pretty close to the actual thing you're working on or investigating, and if it isn't that's a cultural issue that won't be solved by just "prefer to inline".


>why the hell would it be "easier to read" if I had to jump up and down the file to 7 different submethods when the function's entire flow is always sequential?

If the submethods were clearly named then you'd only need to read the seven submethod names to understand what the function did, which is easier than reading 100 lines of code.


Why is that any easier than having comments in the code that describe each part? In languages that don't allow closures, there's no good way to pass state between the seven functions unless you pass all the state you need, either by passing all the variables directly, or by creating an instance of a class/struct/whatever to hold those same variables and passing that. If you're lucky it might only be a couple of variables, but one can imagine that it could be a lot.


If all the functions need state from all the other functions, that is the problem a class or a struct solves - e.g. a place to store shared state.

If the 7 things are directly related to one another and are _really_ not atomic things (e.g. "Find first user email", "Filter unknown hostnames", etc), then they can be in a big pile in their own place, but that is typically pretty rare.

In general, you really want to let the code be crisp enough and your function names be intuitive enough that you don't need comments. If you have comments above little blocks of code like "Get user name and reorder list", that should probably just go into its own function.

Typically I build my code in "layers" or "levels". The lowest level is a gigantic pile of utility functions. The top level is the highest level abstractions of whatever framework or interface I'm building. In the middle are all the abstractions I needed to build to bridge the two, typically programs are between 2-4 layers deep. Each layer should have all the same semantics of everything else at that layer, and lower layers should be less abstract than higher layers.


My problem with the class/struct approach is it doesn't work if you don't need everything everywhere.

    foo(...):
        f1(a,b,c,d,e,f)
        f2(a,c,d,f)
        f3(b,c,d,e)
        ...
        f7(d,e)
But with long descriptive variable names that you'd actually use so the function calls don't fit on one line. Better imo to have a big long function instead of a class and passing around extra variables.

Though, ideally there isn't this problem in the first place/it's refactored away (if possible).


A function that needs so many parameters is already a no go.

If it doesn't return anything, then it's either a method in a class, or it's a thing that perform some tricky side effect that will be better completely removed with a more sound design.


Creating a class around the too many arguments you want to pass to your function may be a good idea if the concept happens to be coherent and hopefully a bit more long-lived than just the function call.

Otherwise, your just hiding the fact that your function requires too many arguments by calling them properties.


Well, if there is no class that seems to make sense to group them, that's an additional flag that points to additional thoughts on design. Or discussion with fellow developer about it.

Of course, on some very exceptional case, 7 arguments might be relevant after all. If that is like the single one in the code base, and after thorough discussion with everyone implicated in the maintenance of the code it was agreed as an exceptionally acceptable trade-off for some reasons, making sure this would not leak in all the code base as it's called almost everywhere, then let it be.

But if it's a generalized style through the whole codebase, there are obvious lake of care for maintenability of the work and the team is going to pay for that sooner than later.


> A function that needs so many parameters is already a no go.

This rule is the same as lines of code type rules. The number itself is not the issue, it could be few parameters and a problem or it could be many parameters and not be an issue at all.


You access the shared data via the struct / class reference, not as method parameters. That's the benefit.

e.g.

    foo(...):
        # Fields
        a
        b
        c
        d 
        e
        
        # Methods
        f1(f)
        f2(f)
        f3()
        ...
        f7()


Moving them to a higher scope makes it harder to change anything in foo. Now anytime you want to read or write a-e you have to build the context to understand their complete lifecycles. If all the logic were smooshed together, or if it were factored into the original functions with lots of parameters, as ugly as either of them might be, you still have much more assurance about when they are initialized and changed, and the possible scopes for those events are much more obviously constrained in the code.


If all those functions need all those variables, then you're either going to put them in a class, or put all those variables in something like a dict and just pass that in.

Seeing 10 variables passed in to a function is a code smell.

Whether you put in in a common class / struct or aggregate them in a dict depends on whether or not all those functions are related.

In general, your functions should not be super duper long or super duper intended. Those are also code smells that indicate you have the wrong abstractions.


It works fine. Not all the methods need to use all the struct members.


Language syntax defines functional boundaries. A strong functional boundary means you don't have to reason about how other code can potentially influence your code, these boundaries are clearly defined and enforced by the compiler. If you just have one function with blocks of code with comments, you still must engage with the potential for non-obvious code interactions. That's much higher cognitive load than managing the extra function with its defined parameters.


In the ideal case, sure, but if assuming this can't be refactored, then the code

    foo(...):
       // init
       f1(a,b,c,d,e,f)
       f2(a,b,c,d,e,f)
       ...
       f7(a,b,c,d,e,f)
or the same just with a,b,c,d,e,f stuffed into a class/struct and passed around, isn't any easier to reason about than if those functions are inline.


There's at least one reason that something like this is going to be exceedingly rare in practice, which is that (usually) functions return things.

In certain cases in C++ or C you might use in/out params, but those are less necessary these days, and in most other languages you can just return stuff from your functions.

So in almost every case, f1 will have computed some intermediate value useful to f2, and so on and so forth. And these intermediate values will be arguments to the later functions. I've basically never encountered a situation where I can't do that.

Edit: and as psychoslave mentions, the arguments themselves can be hidden with fluent syntax or by abstracting a-f out to a struct and a fluent api or `self`/`this` reference.

Cases where you only use some of the parameters in each sub-function are the most challenging to cleanly abstract, but are also the most useful because they help to make complex spaghetti control-flow easier to follow.


I disagree. Your example tells me the structure of the code at a glance. If it was all inlined I would have to comprehend the code to recover this simple structure. Assuming the F's are well-name that's code I don't have to read to comprehend its function. That's always a win.


This typically can be coded with something like

def foo(...) = Something.new(...).f1.f2.f7

Note that ellipsis here are actual syntax in something like Ruby, other languages might not be as terse and convinient, but the fluent pattern can be implemented basically everywhere (ok maybe not cobol)


> there's no good way to pass state between the seven functions unless you pass all the state you need,

That’s why it’s better than comments: because it gives you clarity on what part of the state each function reads or writes. If you have a big complex state and a 100 line operation that is entirely “set attribute c to d, set attribute x to off” then no, you don’t need to extract functions, but it’s possible that e.g this method belongs inside the state object.


>Why is that any easier than having comments in the code that describe each part?

Because you only read the submethod names, and then you already understand what the code does, at the level you're currently interested in.


>Why is that any easier than having comments in the code that describe each part?

Because 7<<100


> Because 7<<100

But then, 7 << 100 << (7 but each access blanks out your short-term memory), which is how jumping to all those tiny functions and back plays out in practice.


>which is how jumping to all those tiny functions and back plays out in practice.

Why would you jump into those functions and back?


Because I need to know what they actually do? The most interesting details are almost always absent from the function name.

EDIT:

For even a simplest helper, there's many ways to implement it. Half of them stupid, some only incorrect, some handling errors the wrong way or just the wrong way for the needs of that specific callee I'm working on. Stupidity often manifests in unnecessary copying and/or looping over copy and/or copying every step of the loop - all of which gets trivially hidden by extra indirection of a small function calling another small function. That's how you often get accidental O(n^2) in random places.

Many such things are OK or not in context of caller, none of this is readily apparent in function signatures or type system. If the helper fn is otherwise abstracting a small idiom, I'd argue it's only obscuring it and providing ample opportunities to screw up.

I know many devs don't care, they prefer to instead submit slow and buggy code and fix it later when it breaks. I'm more of a "don't do stupid shit, you'll have less bugs to fix and less performance issues for customers to curse you for" kind of person, so cognitive load actually matters for me, and wishing it away isn't an acceptable solution.


>Because I need to know what they actually do?

Strange. The longer I've been programming, the less I agree with this.

>For even a simplest helper, there's many ways to implement it.

Sure. But by definition, the interface is what matters at the call site.

> That's how you often get accidental O(n^2) in random places.

Both loops still have to be written. If they're in separate places, then instead of a combined function which is needlessly O(n^2) where it should be O(n), you have two functions, one of which is needlessly O(n) where it should be O(1).

When you pinpoint a bottleneck function with a profiler, you want it to be obvious as possible what's wrong: is it called too often, or does it take too long each time?

> If the helper fn is otherwise abstracting a small idiom, I'd argue it's only obscuring it and providing ample opportunities to screw up.

Abstractions explain the purpose in context.

> I'm more of a "don't do stupid shit, you'll have less bugs to fix and less performance issues for customers to curse you for" kind of person

The shorter the function is, the less opportunity I have to introduce a stupidity.


Why does pressing "go to defn" blank your short term memory in a way that code scrolling beyond the top of the screen doesn't?


Because jumping is disorienting, because each defn has a 1-3 lines of overhead (header, delimiters, whitespace) and lives among other defns, which may not be related to the task at hand, and are arranged in arbitrary order?

Does this really need explaining? My screen can show 35-50 lines of code; that can be 35-50 lines of relevant code in a "fat" function, or 10-20 lines of actual code, out of order, mixed with syntactic noise. The latter does not lower cognitive load.


I wouldn't have asked if I didn't have a real curiosity!

To use a real world example where this comes up a lot, lots and lots of code can be structured as something like:

    accum = []
    for x in something():
        for y in something_else():
            accum.append(operate_on(x, y))
I find structuring it like this much easier than fully expanding all of these out, which at best ends up being something like

    accum = []
    req = my_service.RpcRequest(foo="hello", bar=12)
    rpc = my_service.new_rpc()
    resp = my_service.call(rpc, req)
    
    req = my_service.OtherRpcRequest(foo="goodbye", bar=12)
    rpc = my_service.new_rpc()
    resp2 = my_service.call(rpc, req)

    for x in resp.something:
        for y in resp2.something_else:
            my_frobnicator = foo_frobnicator.new()
            accum.append(my_frobnicator.frob(x).nicate(y))
and that's sort of the best case where there isn't some associated error handling that needs to be done for the rpc requests/responses etc.

I find it much easier to understand what's happening in the first case than the second, since the overall structure of the operations on the data is readily apparent at a glance, and I don't need to scan through error handling and boilerplate.

Like, looking at real-life examples I have handy, there's a bunch of cases where I have 6-10 lines of nonsense fiddling (with additional lines of documentation that would be even more costly to put inline!), and that's in python. In cpp, go, and java which I use at work and are generally more verbose, and have more rpc and other boilerplate, this is usually even higher.

So the difference is that my approach means that when you jump to a function, you can be confident that the actual structure and logic of that function will be present and apparent to you on your screen without scrolling or puzzling. Whereas your approach gives you that, say, 50% of the time, maybe less, because the entire function doesn't usually fit on the screen, and the structure may contain multiple logical subroutines, but they aren't clearly delineated.


If the variables were clearly named, I wouldn't have to read much at all, unless I was interested in the details. I reitrate: why does the length of the single function with no reuse matter?


It does not matter if function foo is reused, only if the code inside foo that is to be pulled into new function bar is.


For unit testing those sub-sections in a clear and concise manner (i.e., low cognitive load). As long as the method names are descriptive no jumping to and fro is needed usually.

That doesn't mean every little unit needs to be split out, but it can make sense to do so if it helps write and debug those parts.


Then you need to make those functions public, when the goal is to keep them private and unusable outside of the parent function.

Sometimes it's easy to write multiple named functions, but I've found debugging functions can be more difficult when the interactions of the sub functions contribute to a bug.

Why jump back and forth between sections of a module when I could've read the 10 lines in context together?


> Then you need to make those functions public, […]

That depends on the language, but often there will be a way to expose them to unit tests while keeping them limited in exposure. Java has package private for this, with Rust the unit test sits in the same file and can access private function just fine. Other languages have comparable idioms.


Javascript doesn't, AFAIK. I work in Elixir, which doesn't.

I'm for it if it's possible but it can still make it harder to follow.


Because a function clearly defines the scope of the state within it, whereas a section of code within a long function does not. Therefore a function can be reasoned about in isolation, which lowers cognitive load.


I don't agree. If there are side effects happening which may be relevant, the section of code within a long function is executing in a clearly defined state (the stuff above it has happened, the stuff below it won't happen until it finishes) while the same code in a separate function could be called from anywhere. Even without side effects, if it's called from more than one place, you have to think about all of its callers before you change its semantics, and before you look, you don't know if there is more than one caller. Therefore the section of code can be reasoned about with much lower cognitive load. This may be why larger subroutines correlate with lower bug rates, at least in the small number of published empirical studies.

The advantage of small subroutines is not that they're more logically tractable. They're less logically tractable! The advantage is that they are more flexible, because the set of previously defined subroutines forms a language you can use to write new code.

Factoring into subroutines is not completely without its advantages for intellectual tractability. You can write tests for a subroutine which give you some assurance of what it does and how it can be broken. And (in the absence of global state, which is a huge caveat) you know that the subroutine only depends on its arguments, while a block in the middle of a long subroutine may have a lot of local variables in scope that it doesn't use. And often the caller of the new subroutine is more readable when you can see the code before the call to it and the code after it on the same screen: code written in the language extended with the new subroutine can be higher level.


You can write long functions in a bad way, don't get me wrong. I'm just saying the rule that the length itself is an anti-pattern has no inherent validity.


There's just no way I buy that I could safely make a change in a 100 loc function and know that there won't be an impact 30 lines down, where with a few additional function you can define the shape of interactions and know that if that shape/interface/type is maintained that there won't be unexpected interactions. Its a balance though as indirection can also readily hide and obscure interactions or add unnecessary glue code that also takes up mental bandwidth and requires additional testing to confirm.


As a non English speaker, what does "so much this" mean?

Does it essentially just mean "I agree"?


Yep, basically “I agree with this statement a lot.” It’s very much an “online Americanism.”


In the superlative, yes. It's a fairly new phrase, and hardly in my parlance, but it's growing on me when I'm in informal typed chat contexts.


It's a call for others to take note of the important or profound message being highlighted. So more than just "I agree".


When someone says "this" they are basically pointing at a comment and saying "this is what I think too".

"So much" is applied to intensify that.

So, yes, it's a strong assertion of agreement with the comment they're replying to.


Indeed and breaking out logic into more global scopes has serious downsides if that logic needs to be modified in the future, if your system still needs to support innovation and improvements, downsides not totally unlike the downsides of using a lot of global variables instead of local ones.

Prematurely abstracting and breaking code out into small high level chunks is bad. I try to lay it out from an information theoretic, mathematical perspective here:

https://benoitessiambre.com/entropy.html

with some implications for testing:

https://benoitessiambre.com/integration.html

It all comes down to managing code entropy.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: