Hacker News new | past | comments | ask | show | jobs | submit login

Why is that any easier than having comments in the code that describe each part? In languages that don't allow closures, there's no good way to pass state between the seven functions unless you pass all the state you need, either by passing all the variables directly, or by creating an instance of a class/struct/whatever to hold those same variables and passing that. If you're lucky it might only be a couple of variables, but one can imagine that it could be a lot.



If all the functions need state from all the other functions, that is the problem a class or a struct solves - e.g. a place to store shared state.

If the 7 things are directly related to one another and are _really_ not atomic things (e.g. "Find first user email", "Filter unknown hostnames", etc), then they can be in a big pile in their own place, but that is typically pretty rare.

In general, you really want to let the code be crisp enough and your function names be intuitive enough that you don't need comments. If you have comments above little blocks of code like "Get user name and reorder list", that should probably just go into its own function.

Typically I build my code in "layers" or "levels". The lowest level is a gigantic pile of utility functions. The top level is the highest level abstractions of whatever framework or interface I'm building. In the middle are all the abstractions I needed to build to bridge the two, typically programs are between 2-4 layers deep. Each layer should have all the same semantics of everything else at that layer, and lower layers should be less abstract than higher layers.


My problem with the class/struct approach is it doesn't work if you don't need everything everywhere.

    foo(...):
        f1(a,b,c,d,e,f)
        f2(a,c,d,f)
        f3(b,c,d,e)
        ...
        f7(d,e)
But with long descriptive variable names that you'd actually use so the function calls don't fit on one line. Better imo to have a big long function instead of a class and passing around extra variables.

Though, ideally there isn't this problem in the first place/it's refactored away (if possible).


A function that needs so many parameters is already a no go.

If it doesn't return anything, then it's either a method in a class, or it's a thing that perform some tricky side effect that will be better completely removed with a more sound design.


Creating a class around the too many arguments you want to pass to your function may be a good idea if the concept happens to be coherent and hopefully a bit more long-lived than just the function call.

Otherwise, your just hiding the fact that your function requires too many arguments by calling them properties.


Well, if there is no class that seems to make sense to group them, that's an additional flag that points to additional thoughts on design. Or discussion with fellow developer about it.

Of course, on some very exceptional case, 7 arguments might be relevant after all. If that is like the single one in the code base, and after thorough discussion with everyone implicated in the maintenance of the code it was agreed as an exceptionally acceptable trade-off for some reasons, making sure this would not leak in all the code base as it's called almost everywhere, then let it be.

But if it's a generalized style through the whole codebase, there are obvious lake of care for maintenability of the work and the team is going to pay for that sooner than later.


> A function that needs so many parameters is already a no go.

This rule is the same as lines of code type rules. The number itself is not the issue, it could be few parameters and a problem or it could be many parameters and not be an issue at all.


You access the shared data via the struct / class reference, not as method parameters. That's the benefit.

e.g.

    foo(...):
        # Fields
        a
        b
        c
        d 
        e
        
        # Methods
        f1(f)
        f2(f)
        f3()
        ...
        f7()


Moving them to a higher scope makes it harder to change anything in foo. Now anytime you want to read or write a-e you have to build the context to understand their complete lifecycles. If all the logic were smooshed together, or if it were factored into the original functions with lots of parameters, as ugly as either of them might be, you still have much more assurance about when they are initialized and changed, and the possible scopes for those events are much more obviously constrained in the code.


If all those functions need all those variables, then you're either going to put them in a class, or put all those variables in something like a dict and just pass that in.

Seeing 10 variables passed in to a function is a code smell.

Whether you put in in a common class / struct or aggregate them in a dict depends on whether or not all those functions are related.

In general, your functions should not be super duper long or super duper intended. Those are also code smells that indicate you have the wrong abstractions.


It works fine. Not all the methods need to use all the struct members.


Language syntax defines functional boundaries. A strong functional boundary means you don't have to reason about how other code can potentially influence your code, these boundaries are clearly defined and enforced by the compiler. If you just have one function with blocks of code with comments, you still must engage with the potential for non-obvious code interactions. That's much higher cognitive load than managing the extra function with its defined parameters.


In the ideal case, sure, but if assuming this can't be refactored, then the code

    foo(...):
       // init
       f1(a,b,c,d,e,f)
       f2(a,b,c,d,e,f)
       ...
       f7(a,b,c,d,e,f)
or the same just with a,b,c,d,e,f stuffed into a class/struct and passed around, isn't any easier to reason about than if those functions are inline.


There's at least one reason that something like this is going to be exceedingly rare in practice, which is that (usually) functions return things.

In certain cases in C++ or C you might use in/out params, but those are less necessary these days, and in most other languages you can just return stuff from your functions.

So in almost every case, f1 will have computed some intermediate value useful to f2, and so on and so forth. And these intermediate values will be arguments to the later functions. I've basically never encountered a situation where I can't do that.

Edit: and as psychoslave mentions, the arguments themselves can be hidden with fluent syntax or by abstracting a-f out to a struct and a fluent api or `self`/`this` reference.

Cases where you only use some of the parameters in each sub-function are the most challenging to cleanly abstract, but are also the most useful because they help to make complex spaghetti control-flow easier to follow.


I disagree. Your example tells me the structure of the code at a glance. If it was all inlined I would have to comprehend the code to recover this simple structure. Assuming the F's are well-name that's code I don't have to read to comprehend its function. That's always a win.


This typically can be coded with something like

def foo(...) = Something.new(...).f1.f2.f7

Note that ellipsis here are actual syntax in something like Ruby, other languages might not be as terse and convinient, but the fluent pattern can be implemented basically everywhere (ok maybe not cobol)


> there's no good way to pass state between the seven functions unless you pass all the state you need,

That’s why it’s better than comments: because it gives you clarity on what part of the state each function reads or writes. If you have a big complex state and a 100 line operation that is entirely “set attribute c to d, set attribute x to off” then no, you don’t need to extract functions, but it’s possible that e.g this method belongs inside the state object.


>Why is that any easier than having comments in the code that describe each part?

Because you only read the submethod names, and then you already understand what the code does, at the level you're currently interested in.


>Why is that any easier than having comments in the code that describe each part?

Because 7<<100


> Because 7<<100

But then, 7 << 100 << (7 but each access blanks out your short-term memory), which is how jumping to all those tiny functions and back plays out in practice.


>which is how jumping to all those tiny functions and back plays out in practice.

Why would you jump into those functions and back?


Because I need to know what they actually do? The most interesting details are almost always absent from the function name.

EDIT:

For even a simplest helper, there's many ways to implement it. Half of them stupid, some only incorrect, some handling errors the wrong way or just the wrong way for the needs of that specific callee I'm working on. Stupidity often manifests in unnecessary copying and/or looping over copy and/or copying every step of the loop - all of which gets trivially hidden by extra indirection of a small function calling another small function. That's how you often get accidental O(n^2) in random places.

Many such things are OK or not in context of caller, none of this is readily apparent in function signatures or type system. If the helper fn is otherwise abstracting a small idiom, I'd argue it's only obscuring it and providing ample opportunities to screw up.

I know many devs don't care, they prefer to instead submit slow and buggy code and fix it later when it breaks. I'm more of a "don't do stupid shit, you'll have less bugs to fix and less performance issues for customers to curse you for" kind of person, so cognitive load actually matters for me, and wishing it away isn't an acceptable solution.


>Because I need to know what they actually do?

Strange. The longer I've been programming, the less I agree with this.

>For even a simplest helper, there's many ways to implement it.

Sure. But by definition, the interface is what matters at the call site.

> That's how you often get accidental O(n^2) in random places.

Both loops still have to be written. If they're in separate places, then instead of a combined function which is needlessly O(n^2) where it should be O(n), you have two functions, one of which is needlessly O(n) where it should be O(1).

When you pinpoint a bottleneck function with a profiler, you want it to be obvious as possible what's wrong: is it called too often, or does it take too long each time?

> If the helper fn is otherwise abstracting a small idiom, I'd argue it's only obscuring it and providing ample opportunities to screw up.

Abstractions explain the purpose in context.

> I'm more of a "don't do stupid shit, you'll have less bugs to fix and less performance issues for customers to curse you for" kind of person

The shorter the function is, the less opportunity I have to introduce a stupidity.


Why does pressing "go to defn" blank your short term memory in a way that code scrolling beyond the top of the screen doesn't?


Because jumping is disorienting, because each defn has a 1-3 lines of overhead (header, delimiters, whitespace) and lives among other defns, which may not be related to the task at hand, and are arranged in arbitrary order?

Does this really need explaining? My screen can show 35-50 lines of code; that can be 35-50 lines of relevant code in a "fat" function, or 10-20 lines of actual code, out of order, mixed with syntactic noise. The latter does not lower cognitive load.


I wouldn't have asked if I didn't have a real curiosity!

To use a real world example where this comes up a lot, lots and lots of code can be structured as something like:

    accum = []
    for x in something():
        for y in something_else():
            accum.append(operate_on(x, y))
I find structuring it like this much easier than fully expanding all of these out, which at best ends up being something like

    accum = []
    req = my_service.RpcRequest(foo="hello", bar=12)
    rpc = my_service.new_rpc()
    resp = my_service.call(rpc, req)
    
    req = my_service.OtherRpcRequest(foo="goodbye", bar=12)
    rpc = my_service.new_rpc()
    resp2 = my_service.call(rpc, req)

    for x in resp.something:
        for y in resp2.something_else:
            my_frobnicator = foo_frobnicator.new()
            accum.append(my_frobnicator.frob(x).nicate(y))
and that's sort of the best case where there isn't some associated error handling that needs to be done for the rpc requests/responses etc.

I find it much easier to understand what's happening in the first case than the second, since the overall structure of the operations on the data is readily apparent at a glance, and I don't need to scan through error handling and boilerplate.

Like, looking at real-life examples I have handy, there's a bunch of cases where I have 6-10 lines of nonsense fiddling (with additional lines of documentation that would be even more costly to put inline!), and that's in python. In cpp, go, and java which I use at work and are generally more verbose, and have more rpc and other boilerplate, this is usually even higher.

So the difference is that my approach means that when you jump to a function, you can be confident that the actual structure and logic of that function will be present and apparent to you on your screen without scrolling or puzzling. Whereas your approach gives you that, say, 50% of the time, maybe less, because the entire function doesn't usually fit on the screen, and the structure may contain multiple logical subroutines, but they aren't clearly delineated.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: