I, for one, don't blindly copy and paste code. Code collects entropy over time; if it's been around for a while, it almost invariably can be improved. Copying and modifying is cargo-cult coding in the small and one of the fastest ways to create a codebase where you have no idea what the code is doing.
Adding process and standardization around this seems like a patch on a dysfunctional coding culture. Standardization frequently ends up on something that isn't the best way, and with web development in particular, is likely to be outdated in a few months. Requiring meetings to change standards to improve code has got to be a toxic working culture.
Yes, but blind to what? The coder is blindly manipulating symbols[0]. What if the code produces the artifact you desire? I'm not sure anyone really blindly copies and pastes. On the flip side, if you forget what you wrote in a month or two, how is that outcome different than copying and pasting in the first place?
// this code accomplishes x
foo();
bar();
frob(42);
They want to do x, and they copy the whole block, and perhaps modify the argument. They have only a surface understanding of what foo, bar and frob do, if any. They don't know if there is an order dependency; they don't know if the behaviour is specific to the situation where the code was copied from, or whether it applies (or not) to the place the code is copied to.
This is blind copy and paste. It is blind because it is not seeing through the symbols to the workings underneath. Functions, classes etc. as abstractions are useful, but they're leaky, and to write good code, you need to understand things at a lower level. At the very least, you need to know about implicit dependencies, time and memory complexity, and have a vague idea of the heft - the constant factor - of what you're putting into motion.
If it's a codebase undergoing continuous development, and foo / bar / frob are grouped together in more than one place, it's a smell. The code probably needs refactoring and restructuring, and all the calling points need adjusting.
This is just a more concrete example to get across what I'm talking about, since it seemed to be unclear. I don't have time to watch a video before replying, sorry.
Also the only way I forget what I wrote a month or two ago is if I've changed jobs, and then it's a willing forgetfulness. It takes many years for me to forget my code, for better or worse.
Pair programming, code review, and actively refactoring code to use existing abstractions or build new abstractions when it appears that the same problem is cropping up in a situation that might warrant copy / paste / modify.
It might be possible that a couple of different abstractions / patterns crop up that solve the same underlying problem. It usually isn't a problem for them to coexist for a while until one proves better. Sometimes they evolve in different directions; sometimes the people most responsible for the respective abstractions agree between them what synthesis should win going forward.
I'm just worried about giving too much ammo to a certain type of code lawyer, who likes to quote chapter and verse and gets a kick out of being technically correct. You find these kinds of people everywhere, on Wikipedia, increasingly on StackOverflow, etc., and I think they can create a bit of a poisonous culture if they have too much law to enforce. There is a place for some low-level standards, don't get me wrong, but it can get out of hand.
I agree with you in the following scenario: small team (<20 dev), manageable codebase size (~ 200K line), and enough budget to keep the project running without too much specialization.
If you cannot keep the project within those parameters, code nazi is unfortunately the way to go.
For most people on HN that should be another day at work, for entreprise backoffice system developers like me, that's a rare sight.
And in any case it all breaks down without great communication between team members and compatible code culture (i.e. there is some way for the whole team to agree on something and apply it without childish guerrilla resistance )
Copy and paste code out of the same repository, modifying it slightly? Surely the more appropriate thing to do in most cases it to make that code generic, available as a function, and then call it from the two places that now need it?
It sounds like it is the code review of future commits that needs to be fixed here.
Ok, let me give you an example, where my instinct would be to copy and then modify:
We have a library that requires initialization, action, and then freeing of resources. Initialization involves creating and filling in a struct and/or passing parameter values to an initialization function. After that it is necessary to check for success/failure. That's for initializing the library. Then something similar is done to initialize an object we want to manipulate.
I would begin with the copied code, replace the appropriate values in the initialization, replace the error handling, replace the action, and keep the freeing of resources as is.
If you find yourself repeating a pattern in many places, with slight variations in the values used, that's a good candidate for refactoring that pattern into an abstraction for that resources. In your example, I can easily imagine refactoring uses of that library being abstracted into a class or a set of functions so that interaction with the library only happens in one place.
An example that superficially looks like "cut and paste" programming occurs is actually implementing a new derived class. In the C++ case, you could:
1. Type out the new .h and cpp files from scratch.
2. Copy/rename the base class .h file, remove all non virtual functions and other text not needed for a derived class, copy/rename this new header creating a .cpp file, convert method declarations to definitions, and add method implementations.
3. Copy/renamed the .h and .cpp files for an existing derived class and replace the existing method implementations.
Option 3 is a reasonable choice and is an example of what article article discusses.
Option 3 is a reasonable choice, but I don't think it's an example of what the article discusses.
The example you provided is a good one, because it's an instance where you can't write abstractions to help you in the language you're using. Defining new classes, their interfaces and members is part of the necessary scaffolding when writing in C++. (Let's pretend it's not possible to do this with the C preprocessor.)
But the first example in the article does not sound like your example. Rather, it sounds closer to im3w1l's example, where the programmer copied code for some particular functionality, not for scaffolding.
I don't see im3w1l's example as copying code for some particular functionality, it's copying boilerplate code needed to use a particular library. First the library is initialised, passing in a number of parameters to an initialisation function. If the values of these parameters are the same in multiple instances in the same codebase then that is violating DRY, but if the parameters differ it's just the scaffolding that you are reusing.
The difference, to me, is that it's scaffolding code that can be abstracted. You can't abstract writing a class - it's a primary activity in the language. Only macros can help you there - either the broken ones provided by the C preprocessor, or real ones provided by languages like Racket or Rust.
But boilerplate code that interacts with an external library is not a primary activity in the language. You can hide it behind a class or a family of functions.
Well consider the following c code. Demonstrates how to allocate memory on gpu, transfer data in and out, how to operate on the memory, and how to then free it. I could see myself copy pasting that to several places, but modified to perform different functions.
It can be quite a bit more than just two lines, particularly considering error handling. But when you start refactoring common patterns, you can often end up with higher level abstractions. In your example, Thrust is exactly that: https://developer.nvidia.com/Thrust
I'll proceed as if that was implemented on top of cuda, I have no idea if that is actually the case, but for the sake of argument.
That code does not initialize the library, instead the handle is stored as a global variable that is lazily initialized when using thrust methods. This makes it easier to use but has some drawbacks too. It can not use more than one card at once. It means every call will check if library is initialized safer but slower. Resources are not freed as soon as possible. A good trade off in the majority of cases.
A device vector is created and assigned a value in one call. Size is automatically tracked. Nice.
Errors are not handled. All code looks cleaner if you don't handle errors.
I copy and paste code all the time, but almost only for tests. I don't think tests should be abstracted too much. I don't appreciate having to work to understand what a test is doing, and when the test gets too long it's a sign you're testing at the wrong level.
Additionally, one of the biggest problems with copying code is that it will break when the two versions get out of sync. Test code inherently protects you against this, by virtue of having 100% test coverage, itself.
>place a very high value on what ends up checked-in to a source code repository. The reason for this is very simple: once code gets checked-in, it takes on a life of its own.
I would modify this slightly and say code that's added to important public branches designated as master, qa, integration, etc should have a very high bar for quality. The other "experimental", "sandbox", etc branches would be understood by the team to have suboptimal code. Yes, it doesn't prevent people from Ctrl+C+Ctrl+V'ing the bad code but at least the developer's work-in-progress code has been backed up to a wider repository than just his laptop. (Yes, the programmer should be backing up files to USB flash, or corporate LAN, yada yada, but disciplined backups outside of repository commits often don't happen.)
That said, cargo culting bad code is definitely a problem. It doesn't matter of the source code has any of these comments:
//This following is experimental. DON'T COPY IT!
//!!!Do not reuse the following loop. It is O(N) instead of O(log N) and I haven't optimized it yet!!!
It doesn't matter how many scary exclamation points and uppercase characters are used in the warning. It will be ignored.
Even if you using language level guards instead of comments such as #pragmas or metaprogramming (if this_function_name == "JohnExperiment3"), people will just ignore the guards and Ctrl+C+Ctrl+V the "bad" source code lines they need. The infamous example of "if windowsos.version begins with "9" is copied all over the place.
> Even if you using language level guards instead of comments such as #pragmas or metaprogramming (if this_function_name == "JohnExperiment3"), people will just ignore the guards and Ctrl+C+Ctrl+V the "bad" source code lines they need. The infamous example of "if windowsos.version begins with "9" is copied all over the place.
Hmm, could this be the reason for windows 10 after 8?
This is a great article, something I have noticed but failed to place into a good metaphor like this.
The other place this happens is in online code snippets for blogs and the like. If you need a way of doing things, you often search. If you find it, you'll probably copy it. If you find the same thing twice, you'll think everyone does it this way.
So the lesson there is to always be careful about code you put into blog posts and stack overflow answers and the like. If you cut a corner, get sloppy with a naming standard or fail to put in basic checks, you're spreading landlines across the world and there's no way of recalling them, because nobody comes back to review a solution they found online.
"The original engineer knew that they hadn’t properly solved the problem and left it as a landmine for someone else. It just so happened that I stepped on it and now couldn’t remove my foot until it was properly disarmed."
The problem with these warnings is that their authors assume there is a better, economically viable way of doing a particular thing. When it was economically not viable for the original author to find and apply the ideal solution, why I or my project manager spend the time to find the "real" solution (which may or may not have any effect on the end user)?
There's not even a guarantee that the "real" solution would be more maintainable or less complex than the quick fix (for example, what is the "real" solution to fixing a confirmed bug in some Swing base class?). Most of the really dirty fixes I've encountered fall into this category.
This is a good article, I'm sure I'll find myself saying "no bad bunny code" soon.
I've encountered this phenomenon and there are a few situations where it can feel natural or correct to copy and paste code. While I can't say these are great reasons, they do merit being considered.
0. QA concerns. You can't just modify/change logic without triggering a need for QA to re-check. This is in fact, one of the biggest stumbling blocks for code quality in a company. You just can't refactor as needed, it has to be planned for and paid for. There are times when you're basically stuck using bad/logic and just extending it.
1. You're a guest in the code. You're just working on this feature for one iteration, and the other guy who will likely be on the project for the rest of the year wrote the bunny code. I wouldn't change how he does things, I'd instead talk him and let him know there's a better way. I wouldn't implement a second type of fix.
2. It's unclear what direction the primary authors want the codebase to grow in. For instance do you use library level functions to accomplish tasks the language syntax can? (for loops vs an iterator function). Some prefer one over the other. I would just follow the style I see
3. Styleguide logic. E.g. the code should look as much as possible like it's written by one person. While I know this rule applies to indentation and visual appearance. I think it's fair-ish to apply to it to logic as well.
These are a few things to think of, but I agree with the author of the post, be careful, this code will multiply.
> In my current role at Box, I’m famous for repeating the phrase, “no accidental standards.” We don’t accept that things are “the way” just because they pop up in a couple of places. When we see this happening, we stop, discuss it, and either codify it as “the way” or disallow it. We then update code appropriately before it gets too far. Through automation, code reviews, and code workshops[1], we are able to keep an eye on the code and make sure we’re all on the same page.
no.
There's a faulty assumption here: that code in a repo should have a pattern. That there's a Way of writing code that should be followed within a company. That it's acceptable to believe code already checked in is the Way.
no.
Having worked in a team that was very serious about this approach advocated by the OP, what happens is that opinions about the Way change, so the fad from two years ago is no longer the Way, but it still ghosts on through the code base.
You will encounter many ways of writing code in many repositories through your life.... the ideal Way of the code is to be able to write the code needed, how it needs to be written, without bothering too much about the Way Of the Mandate (but don't copy and paste).
It's slightly Tao, I think. The Way that can be described is not the Way...
Eh... sometimes? But, honestly, that's not an assumption that should be made... you need to be fluid enough to handle it all. Even in the Theoretically Standardized codebases, different approaches and styles come and go according to the fad du jour.
Right now I have to deal with 3 codebases on a regular basis, in three-four languages, written partially by people not even here, over the past 5 years. On a bad week, I have to crack open 3-4 more codebases in 2-3 more languages... and don't even get me started on reading open source code.
If I am looking for predictability, I'm going to be disappointed, and if I need it, I'm going to fail. Grok it is what I say.
Adding process and standardization around this seems like a patch on a dysfunctional coding culture. Standardization frequently ends up on something that isn't the best way, and with web development in particular, is likely to be outdated in a few months. Requiring meetings to change standards to improve code has got to be a toxic working culture.