> There are functional tests that exercise
these classes working together. Fixing a bug in one class breaks many functional tests ..
Huh? If fixing a bug breaks a functional test, then either the test was correct and you changed your expected system behavior and the test saved your bacon, or the test was a change detector test not a functional test.
Saying "remove functional years" is just blinding yourself. A unit test will never tell you if the class you are changing has any relationship to the actual bug reported in production.
It seems like OP has never worked on a large system, or doesn't know about fakes (not mentioned in the article) to run end-to-end functional tests quickly in memory.
Yep, I've been a professional programmer for about twelve years now working in all kinds of large scale systems at a certain internet company. I can say definitively that I've been saved by functional and end to end tests far more often than I have been saved by unit tests. Not that you shouldn't do both, but err on the side of writing tests that will find bugs in the entire system.
> err on the side of writing tests that will find bugs in the entire system
While I agree these tests are going to be closer to your real production system, they also usually take longer to write, and take way longer to run. I can run thousands of unit tests and still have them be an order of magnitude quicker than a full e2e test which requires spinning up a k8s cluster, for example.
I agree that both have their place, but I'd err on the side of writing tests that will run quicker where possible so that my build pipeline doesn't take forever (and eventually cost a fortune in compute power).
So, I'm in the privileged position of having an extremely large cluster of machines dedicated to running my team's tests. Even with that, e2e tests do take longer to run than unit tests. All our product's thousands of presubmit tests (millions of test cases plus random testing) might take fifteen to twenty minutes to run. The post-submits take even longer.
On the other hand, fixing a bug that makes it to production and is not caught by tests takes longer still, and additionally hurts user trust. On balance, I can't recommend forgoing thorough end to end testing just to decrease test run time. Spend more money on resources to run your tests if you must, but don't skimp on end to end testing. There is no substitute.
I agree with you overall, in an ideal world we all have near unlimited compute power and even these "longer" e2e tests run in a few minutes, not hours. Unfortunately it's also not terribly uncommon to see these e2e suites take many hours, or sometimes even days to run.
I think the happy medium might be using a dependency-aware testing pipeline that only runs tests for areas affected by the current commit's changes. I've seen Bazel used in this kind of setup with some success, for example.
Ideally no bugs make it to production, but that's almost never the reality. Obviously it'll depend on your software's use-case and user demographics, but you could erode user-trust just as bad by having a trivial user-facing bug sit visible on your app/site for a few days or weeks because your CI/CD pipeline takes so long to run that it prevents you from pushing out timely releases.
Yeah, another aspect here is optimizing tests properly. Of course, this also takes engineering time, but if you get your performance culture in the right place, hopefully test run costs can be commensurate to the amount of value you're delivering. That will allow you to never have a situation where your tests take inordinate amounts of time because you cannot afford resources to run them.
Everything is tradeoffs, of course. You have to decide what's right for your business. But I think it's rare that there is too much or too thorough of testing. I'm guessing most orgs err in the other direction.
What’s the point of thousands of unit tests running quickly if they don’t give you any confidence that the system continues to meet acceptance criteria?
If my unit tests are testing all of my public interfaces with "Given input X, expect output Y" and they're all passing, that gives me pretty good confidence that the system is in good shape. Most of the time when you do uncover and fix a bug, there is a unit test that you can write which would have detected it, which can then be running for all of the future releases to prevent any regressions.
I'm not advocating for exclusively unit tests, e2e tests are absolutely still necessary to make sure everything plays nicely when you put it all together. I was simply disagreeing with the parent comment that you should lean more on e2e tests than unit tests. I prefer a ratio of unit-integration-e2e tests which is closer to the "Practical test pyramid" [0]
> If my unit tests are testing all of my public interfaces with "Given input X, expect output Y" and they're all passing, that gives me pretty good confidence that the system is in good shape.
Unfortunately, in my experience, most production bugs look more like this:
Module A: Returns X + 1
Implementation
int A(int x) {
if (x == 11) crash();
return x + 1;
}
Unit test:
assert(A(1) == 2);
assert(A(5) == 6);
Production:
A(11);
The problem compounds as the parameter space of your system grows. Unless your unit tests exercise the power set of your parameters and results for each module, there is a likelihood of missing bugs like this.
This is exactly my point though, you wouldn't catch this in an e2e test either unless you try with an "11". If you're explicitly trying the "11" in an e2e test, why not just do it in a unit test instead? Once you hit this bug once, you can add an "assert(A(11) == 12);" and move on with confidence. If you want to test this specific scenario in an extra e2e test, you could potentially be adding another 2+ mins to every other CI/CD run that ever happens on the project.
Any good unit tests should at a minimum test the min/max/expected cases, as well as any known special cases. If there are unknown special cases, you're probably not any more likely to find them in e2e tests than you are in unit tests.
To be clear, I was trying to indicate that there would be an interaction between two modules that is not tested via mocks, but is common in production. That is the most common cause of bugs, and also the type that e2e tests tend to catch quite well.
My problem with this is that now every single production release is either pushed back by at least a day (multiple days if issues are found), OR there's code running in production that hasn't run through e2e tests yet.
Plus it adds a layer of complexity to your version control. I'm assuming you're running these e2e tests against a branch that's up to date with the latest master. What happens if I have 4 PRs I need to merge - they all might pass tests when run against the current master but as soon as I merge one, the other 3 haven't technically been tested with master anymore, so I'll have to re-run e2e, which pushes everything back by another day (per PR)?
> is either pushed back by at least a day (multiple days if issues are found),
Which is better than issues making it to production -- presumably what would have happened if you didn't have the tests.
> OR there's code running in production that hasn't run through e2e tests yet.
This situation should never occur except if there's a production emergency causing immediate loss of revenue. Even then, the proper answer is usually to roll back to a known-good version.
> I'm assuming you're running these e2e tests against a branch that's up to date with the latest master. What happens if I have 4 PRs I need to merge - they all might pass tests when run against the current master but as soon as I merge one, the other 3 haven't technically been tested with master anymore, so I'll have to re-run e2e, which pushes everything back by another day (per PR)?
The way releases should work is that periodically you cut a release in a new release branch. After that, further changes to master don't affect the release. All the e2e tests need to run against the release branch before it can be deployed. It should be deployed immediately to your test realm and given time to soak. If bugs are found, minimal fixes should be made against master and then cherry-picked into the release branch, followed by a rerun of tests and redeployment to the test realm. You should aim for a zero-cherry-pick situation. Do this by enabling features and changes via configuration rather than via binary changes.
Once the release branch reaches the appropriate level of stability, it should be gradually rolled out, zone by zone, to production. Zones should have some soak time before further zones are touched. The response to production anomalies should be a roll back to the previous version.
This applies to big, complex services that are mission-critical. For small, less critical services, a daily/push-on-green type of approach might be fine. But for such services testing should be much easier and it shouldn't be a huge imposition to run all tests before each release.
My problem with the "release-branch" approach is mostly that your prod releases end up being big-bang multi-feature releases. Any bug in one feature can delay the entire release. More changes usually means more room for things to go wrong. It's also harder to debug issues when they do happen, and they will happen eventually, regardless of how thorough your test cycle is. Your developers may also have lost context if they time between writing the code and deploying it is so long (eg if this process forces you to monthly or quarterly releases).
> Once the release branch reaches the appropriate level of stability, it should be gradually rolled out, zone by zone, to production. Zones should have some soak time before further zones are touched. The response to production anomalies should be a roll back to the previous version.
I 100% agree with you on this. Canary releases, followed by gradual rollouts with the option to roll back are ideal.
> Do this by enabling features and changes via configuration rather than via binary changes.
If your CI/CD pipeline is fast enough that you can deploy on each commit to master (multiple times daily), then you can roll forward in the rare situations that rolling back isn't an option. Feature flag frameworks certainly have some useful features, but they add a layer of complexity to your deployments which feels like a band-aid to help get around slow deployments. Totally valid for eg an iOS app that takes 2 weeks for Apple to review, but if you're deploying to the web where you can push new releases whenever you want, I don't think the added complexity is worth it.
I think your thorough, managed release approach is absolutely appropriate when the consequences of failure are life-threatening, or heavy revenue losses or brand damage. But the worst case scenario for most of the software most of us work on are closer to "1% of users could potentially see an error message during a 5-60s window before we detect the increased error rate and roll back the canary release"
Came here to say exactly that. The author is advocating for mockist style of unit testing, which is quite terrible for many, many reasons. What you are describing is classic unit testing, and this is the way to go.
> The author is advocating for mockist style of unit testing
IMO: "mockist" style of unit tests are best for corner cases that can only be tested in isolation. For example, error conditions that are important to verify, but so difficult or impossible to test that the only way to test them is to test a single class in isolation.
My initial reaction was to downvote you for your last sentence - I think OP has obviously worked on medium-large systems, and is just strongly opinionated.
However, I agree with your first sentence.
> If fixing a bug breaks a functional test, then either the test was correct and you changed your expected system behavior and the test saved your bacon, or the test was a change detector test not a functional test.
And I think this is a _hugely_ valuable for medium-to-large systems, which typically have many, many layers of abstractions.
I used to work on one such system that had a lot of, what this author would refer to as, functional tests. Sometimes, changing a couple of lines in one of the core modules would cause 50 tests to fail. While this is just a "change detector", it's still really valuable, say about 20/50 of these changes in behavior are unexpected but acceptable, 5/50 are actually intended, but 25/50 of these are actually regressions in ideal behavior. This means the layer at which you made the change is just wrong, or you didn't put sufficient nuance in your change.
It's a big time-sink for sure and made many new developers angry, but honestly the gain was that the overall product had fewer regressions.
I'm somewhat unsure what I would replace these "functional tests" with.
>> There are functional tests that exercise these classes working together. Fixing a bug in one class breaks many functional tests ..
> Huh? If fixing a bug breaks a functional test, then either the test was correct and you changed your expected system behavior and the test saved your bacon, or the test was a change detector test not a functional test.
I think it's more a symptom of fragile mocking, or just making a unit too small.
IMO, when I was in that situation it was because the mocking framework was too fragile. In hindsight, it may have been useful to just have a library of hand-written mock classes instead of relying on a framework to do it for us.
> or the test was a change detector test not a functional test.
Or the test was flat out broken and was testing implementation detail minutiae that don't really matter. As in, in a strange limbo between unit and functional tests. I've seen this many times.
> It seems like OP has never worked on a large system
That's an extreme logic leap + ad hominem in one. You got that from just one article?
If you define a functional test as perfect, that it will never break unless the actual functionality changes, then sure, functional tests are purely good. But that's not what most people end up writing when they try to write functional tests. They don't get them right, they write code-change tests. I've seen some large systems crippled by code-change tests disguised as other types -- functional/unit tests that rely on implementation details that aren't related to the actual behavior being tested, and that make it harder to refactor code, not easier.
My criticism with the author isn't that I think they're wrong about the dangers of functional tests when poorly written, it's that the author seems to be over-eager to recommend unit tests as the solution. Unit tests can be just as problematic if they're poorly written; in fact they can be even more problematic if they're relying on a lot of fragile mocks.
One of the biggest issues I see in large codebases maintained by large teams is people building single-purpose mocks for each test that rely on implementation details. This is especially common when team members are mocking components that they haven't built or that don't have clearly defined public interfaces. The end result is that people end up building tests that are too 'specific' -- they rely on behaviors that are subject to change that we never intended to test in the first place (like translation strings, styling, results from 3rd-party data sources, etc). This slows down development, but more importantly, it actually masks bugs because after a while people start assuming that all code changes will cause tests to fail.
You can say that those aren't real functional/unit tests if you want, you can say that they're poorly written so they don't count. It doesn't matter -- that they're poorly written is the point. People write bad tests when they aren't trained how to write good tests, and an organizational policy that only focuses on increasing test coverage without focusing on the quality of tests being written is bound to fail in the long term.
We spend a lot of energy in some orgs training people to write tests without training them on how to write tests, or conceptually talking about what a unit/functional test is supposed to accomplish. We don't have good education in this area, especially on platforms like the web.
In the worst case, I've even seen managers argue that testing static constant files and translation strings was good policy because, "forcing devs to update a constant in two or three files instead of one will mean that they think harder about the change." Needless to say, that policy did not end up reducing bugs.
> If you define a functional test as perfect, that it will never break unless the actual functionality changes, then sure, functional tests are purely good. But that's not what most people end up writing when they try to write functional tests. They don't get them right, they write code-change tests.
What is your experience with fixing this issue as it arises? Like you see the test breaking and say "Oh the output changed but it still satisfies <important property>, let's change the test to be more general!". Or more drastically "this test keeps having false alarms, let's just remove it".
Unit testing, in the way the author describes, is a great way to test your mocks, however, make code harder to refactor and often add little value. Functional tests, done right, tend to be much more useful as they can match business functionality: if I ask for the weather in New York, I get returned a temperature. That code will call many classes and will not be a unit test, but is more useful than mocking the getTemperature method.
From my experience the downside of functional testing is that it's slower and most importantly when a functional test fails it not obvious what is broken.
The counterpoint I've heard is, "But we run the tests on every commit! If they fail it must be the latest change."
That's a good point and obviously running the tests on commit is a good practice but I still think unit tests have value because:
1) Even if I know the lines of code that caused a test failure I'm still not sure what invariant I've violated. A unit test failing called "EnsureOutputIsEven" immediately tells you what's wrong in a way a functional test "ValidatePdfEmbeddedVideoGeneration" doesn't.
2) Unit tests help isolate problems from miscompilation, platform bugs, and undefined behavior in a way that a functional tests can't. A unit test tells me, "There's something wrong with this specific function" and I can look at the source for undefined behavior and the generated code for problems.
A functional test just says, "Something is wrong somewhere! good luck!"
> "But we run the tests on every commit! If they fail it must be the latest change."
I really want to share a great failure here I've seen:
This was in a django web app, nearly a decade ago. First team I was on professionally, and took around a week to figure out (no one else cared because it was so rare, so I was the one looking for it): memcached would occasionally return corrupted data, that python couldn't un-pickle into an object, because the class attributes didn't match.
Turned out to be because of how our django settings were organized, a large collection of shared settings that no one ever looked in because they were set up once and then ignored, a set of overrides for live, a set for beta, and each dev got their own. The trick was that, to make it easy to set up, the memcached host/port were in the shared, live, and beta files - but not overridden in dev, so all the devs and the CI suite shared the same cache.
This meant that when one dev was changing the structure of one of these cached objects, before committing it all tests relating to it could fail for all devs and in CI.
Ok so you're saying once in a blue moon when a test fails, having small tests will save you an hour. And that's worth the extra time you spend every day writing and rewriting them?
> the downside of functional testing is that it's slower
This is certainly true in some situations, but not all, and I think it's important to make decisions based on the tech stack and tools available. For example, I've found that React + Enzyme + jsdom is a way to write frontend functional tests that are both fast and reliable, so in a codebase like that, I strongly prefer functional tests because they provide much more confidence. I've also found this to be true for software running against low-complexity data stores that are easy to fake, and for things like compilers that are running fully in-memory anyway.
> when a functional test fails it not obvious what is broken
FWIW, I've found that with a good debugger, it's usually pretty quick to start with a functional test failure and trace the issue to the level of specificity that I'd have if it was a unit test failure. But like the other point, I'm sure it depends a lot on context.
> For example, I've found that React + Enzyme + jsdom is a way to write frontend functional tests that are both fast and reliable
I found that tests using Enzyme/testing-library are actually more than 4x slower than unit tests, and give you far worse debugging & increased flakiness.
You can still assert one small thing even with larger tests. Whats the relationship between single field of input and single field of output through multiple objects?
Thats the invariant you test. It's also relationship the business is actually interested in.
I totally agree. The only thing that's worth testing is the API that's being exported and that can usually be transcribed almost directly from some specification document. Everything else is just implementation details.
I haven't experienced that problem. I've experienced the problem everyone else seems to have, too: Too many unit tests makes things brittle over time. One small change breaks a bunch of stuff in some mechanical way. Perhaps a required parameter got added to a function, or perhaps a class got too big and needed to be split up. Either way, you sometimes end up with a slew of tests that don't compile.
Testing at a higher level should help, because the tests are more coupled to the behavior and less coupled to the implementation details. To the extent that you've achieved that, tests breaking due to a change shouldn't be spurious, because the breakage is due to a legitimate change in behavior that needs to be accounted for in the tests' expectations.
Wild guess: The difference in experience here can be explained by the first three words of the article. A contractor who isn't sticking around on any one project for long isn't going to see consequences of their design decisions when they take many years to incubate. And a contractor who's seen this at "no less than six different companies" probably hasn't been a long-timer at many of them. Unit testing has only been de rigueur for 20 years, tops, so, even assuming those were the only six companies, I think it's safe to assume a maximum of 3 years per company. Subtract a year or two from the incubation time because it'll take a while to comprehensively revamp a test suite. A lot of Shakespeare's plays seem to end on a happier note, too, if you always leave the theatre right after the third or fourth act.
The best ROI I get when writing tests is from thorough unit testing of the most fundamental, most heavily used low-level components. That's because components like "StreamReader" or "Lock" or "Response" tend not to change much over time.
In contrast, higher-level tests often break because the high-level components the test change frequently in response to business needs.
I don't get this meme that unit tests are "brittle". Are the interfaces to your low-level components changing all the time? Perhaps they are not written with "loose coupling" in mind and are exposing needlessly fragile and complex APIs?
The earliest proponents I know of - the ones writing XP books back in the '90s - had a very flexible definition of "unit" and assumed that you would vary the scale according to your needs. So some units might be low level components like data structures, but others might be large modules that included a lot of moving parts, up to and including actual databases.
The world has moved on now, and "agile" no longer means "use your own best judgment." It's become a carefully catalogued list of rules and prescriptions and definitions and dictums.
Which, come to think of it, is an excellent example of memetic natural selection. "Use your own best judgment" is a meme that is uniquely poorly adapted to survival in an ecosystem that consists largely of people jabbering back and forth on the Internet like we are right now. You can't base a rollicking good fun argument or a strongly worded blog post on something like that.
> The earliest proponents I know of - the ones writing XP books back in the '90s - had a very flexible definition of "unit" and assumed that you would vary the scale according to your needs. So some units might be low level components like data structures, but others might be large modules that included a lot of moving parts, up to and including actual databases.
It's actually a fairly strict definition, it's just not from the angle that people like to go with (as shown even by your description here): It's a semantic/conceptual unit, not a syntactic unit.
One of my co-workers eventually got it when I told him to completely forget about all the code he's written, then pretend there was a library he could include that did exactly what he wanted. He came up with a single function with a simple set of arguments, that would be used twice, and I stopped him there with something along the lines of: "Okay, that's your API. When you write the code that actually does this, put this function in one separate file and only import this one function in the code and the tests. Now it doesn't matter how you break out the internals into other functions or classes - it's a conceptual unit, not necessarily a single function."
> The world has moved on now and most programmers are duct taping parts together so unit testing no longer has a place in this world.
Even on business-domain level components there are often simple routines which can and should be tested. Larger routines should be built up from smaller ones. Divide-and-conquer is a fundamental engineering principle.
If there are no such small-enough-to-be-tested routines, then the code is not being written with testing in mind. If all your functions are hundreds of lines long, then yeah it's hard to write unit tests!
I agree. Your unit tests should focus on testing individual pieces, the fundamental building blocks (classes and functions) that you have in your system. The changes should be isolated so when a class API changes only those unit tests would need to change.
I find the key thing here to be separation of concerns. Aim for testable classes and functions and that will make you also thing about the depencies.
For example I have class implementing NNT protocol. Initially one could think that this class needs to deal with IO or even worse with socket based data transfer. But if you think about it doesn't. The api can be based on an idea of getting a buffer full of data in and getting buffer Full of data out. Class coupled with IO/socket functionality = coupled hard to test without its trivial to unit test. (Reduced functionality class only maintains the protocol state doesn't concern itself where the data comes or goes)
It's frequently the domain model objects themselves changing that causes all the spurious breakage.
This is not nearly as much a problem in dynamic languages, especially if you tend to do data-level programming instead of defining a lot of custom types for your domain modeling. And I suspect that this is exactly why classical TDD was first popularized among Java developers, while the London school first gained traction in the Ruby and Python communities.
"The smallest amount of testable code. Often a single method/function,"
I hate (HAAAAATE) that definition of unit test. That's how some IDEs implemented unit tests, but it's not what was originally meant with the term 'unit'
The unit tests the author means has no value outside testing while you are programming the code. For real value, unit tests shouldn't be bound to the implementation of your code.
Unit tests should test a "unit of functionality", not a technical part of the code.
actually the origin of the term unit test means a test can be executed as a unit, it does not has dependencies to other tests. It had nothing to do with testing code in isolation.
It might start out as a function, then become a factory, then an object and if you bind your test to the implementation details of this code, you will have to rewrite your test every time you refactor your code. Not ideal.
Most discussion and arbitration about testing comes down to the anemic way programmers talk about interfaces.
We say all the right stuff, but then skimp on the details. "Program to the interface, not the implementation!" Oh, but, what then is the interface?
An interface should be understood as the minimal set of things one may expect to be true about some collection of related abstract types.
- The methods for creating/introducing and destroying/eliminating values at these abstract types
- The expectations around identity (memory model as relevant)
- Universal properties that must hold (for *all* situations like this, the following will hold)
Test specification is a component of naming your interfaces. In that sense, it's wild that test specifications don't live in the same place as interfaces most of the time.
Note that test specifications are not necessarily executable. I can say "for all integers n and m, n x m = m x n" as a test specification, knowing that this is probably not feasible to actually test exhaustively. A test implementation may be partial, limited.
The point of this is that it translates the question of "what test should I write?" into "how do I design interfaces, the expectations I allow of users?". The latter is a dramatically better thing to be spending your time thinking about. In my opinion, it subsumes something like 80% of the discussions people have around testing best practices.
This originally meant relying upon the public declaration of functions and methods rather than implementation details. Over time this has been warped to mean the "interface" keyword in OO languages.
I prefer to code to the non-keyword-interface. Turning everything into a keyword-interface has zero value add and causes a proliferation of shitty keyword-interfaces that are not actually points of extension and polymorphism, which is what keyword-interfaces should be. So I would actually say they add negative value to a codebase.
Agree and disagree. As often is the case, once you make an abstract concept concrete in the codebase it gets distorted and exploited. "Keyword interfaces" as you helpfully define here are a prime example. They are not 1-to-1 with genuine "interfaces" as one would like to discuss.
More than just the declaration of functions and methods, though, is the way that those functions are known to interact! Consequentially, an interface may infect another interface, at least abstractly.
In contrast to what I just said above, I think OCaml has a fantastic "keyword" implementation of interfaces. The whole language is built around it. It's not perfect (there's no way to specify the "laws" between interface elements) but it treats composition of interfaces as a very first class idea. I find myself recommending it all the time.
I'm dealing with this a lot lately. Namely having to work within a spaghetti of procedural code...told to use bizarre global scope variables disguised as static classes (Java).
Wading through it all and making proper tests is so much more difficult than it needs to be. The messiness of even determining what's going on seems to reflect the inability to think about interfaces and construct objects accordingly.
This is simple stuff. Java's got tons of standards, and none of them are used.
The best testing strategy is situational, contextual. Categorical statements are rarely universal.
Is the abstraction stack deep? Integration tests cover a lot more, while unit tests won't give you good confidence that the composition works. Unit tests are useful close to the control flow leaves, and where data flow is bottle-necked. Unit tests are often busywork in the middle of the stack where code up, down, left and right needs to be mocked or stubbed, and your test is just a restatement of the implementation logic written inside out. Hopefully there's enough of a modular architecture that whole submodules can be replaced with a simple API for higher level testing.
Is the abstraction stack shallow and the data model simple? The difference between an integration test and a unit test is small, and if you replace the expensive bits (database, RPC generally) with mocks, fakes, stubs, then they're almost the same.
Is the data model complex? Unit test setup is going to be expensive for every combination, while integration tests can leverage the code itself to generate the data model. Unit testing will likely only check one dimension of data complexity, but complex data models have multiple dimensions and for good coverage, you need cross-product tests which are expensive to run even as unit tests because of combinatorial explosion. I don't think there's a cheap simple answer. You need a mix across the spectrum and try and not run tests when you don't need to (smarter CI).
Is the design changing a lot? Unit tests will be very brittle and you'll spend a lot of time rewriting them, as abstraction layers merge and split both horizontally and vertically. Integration tests are much more valuable. Having a lot of unit tests can inhibit refactoring, and refactoring debt will ironically create more tests to guard against regression from the bugs that come from complexity.
Do you need to know that customers are going to be able to perform their basic routines right after a release? Unit tests won't give you this confidence; you need the highest level integration test driven from the UI, a smoke test.
I don't think telling people to concentrate on one thing vs another is useful without context.
I disagree - I always test a real database, never mock it. This is because in my embedded domain we often ship with an empty database, so it is no big deal to add just the tables I need with whatever data I want, and then test it. Sqlite makes this easy for us.
Likewise filesystems on modern operating systems are fast (there is a cache): I cannot measure the time difference I save mocking the filesystem calls out, so it is easier and safer to just work with files I create in the tests. It might some effort to ensure my tests put files in places where they won't collide with other tests, but this isn't hard once you have the idea in place. Since our plan was put into the code early on most people don't even consider bypassing it for some hard coded locatoin and so it is easy to use the plan on test files located in a different location.
Note that the above is specific to my domain. I understand that some of you deal with one database so there is no abstraction that lets you substitute a test database you create on the fly, something not worth fixing since you still cannot afford a license for every developer and CI machine. I'm sure there is a similar argument against using a file system.
The point: before saying don't use something expensive in tests do a benchmark: prove it actually is expensive. You may be lucky like me and discover that databases and filesystem access in tests are not expensive!
The reason to mock out IO is not that much about the performance, but isolation. It's annoying to have to deal with setting up database instances, users, permissions and what not, just to run an otherwise standalone test. Also annoying when two test suites running in parallel conflict (though it can be avoided). If you use sqlite, it's much easier since it's all just working on standalone file.
> I disagree - I always test a real database, never mock it.
I always put an interface between the IO part and rest of the code, so I can either mock it or not. Ideally I would have a separate tests making sure that the real-IO implementation of the interface works, and not use in my normal test that are exercising logic. But it is indeed a lot of additional work, so I'm cutting the corners and let the test run against a real db implementation, only wrapping it in things like injecting faults etc.
(In general I don't like HN's concern with tagging posts from previous years, but in this case I think it is warranted. It also raises the question of why so many people are upvoting an old article on unit testing.)
This must be a very old article, in the last 5 - 8 years the best ideas in unit testing has changed.
Highly mockist code bases have ended up with their own problems when it comes to change and refactoring.
Not read/writing files, dbs etc fine. Mocking out every object causes a lot problems. Refactoring a single method signature? Now the tests for the object, and each one its consumers(where it is mocked) break.
Specify a public api(Testing is a good opportunity to design it), and test through that. Making everything public, and testing every method? You're going screw up your ability to change.
Atleast in context of typical CRUD APIs, a set of functional tests that run against an in-memory database like H2, are far more realistic and practical in catching issues while being very fast. My typical approach for such scenarios is to define, for every test: a) the base data, b) inputs to the service wrapper method (that is typically invoked via the handling method upon REST call (e.g. in case of a POST request, the body DTO) and c) the post-change expected state - all as JSON test assets. Then before every test, I can load the base data at the start of the tests, call the method with payload DTO, extract the final database state and compare that with expected result state (changes are rolled back at end of each test so they are isolated). I don't need to care about what the nested services/methods are doing as long as the expected state matches.
This approach has helped me catch issues time and again and I would pick this over unit tests for typical CRUD APIs. One scenario where I would prefer unit tests is computation focused methods that would benefit from thorough testing using a wide variety of input values.
I think the types of testing that are useful are critically dependent on the language we're talking about.
If you're working with a dynamic language like Ruby, Python, Javascript, then it's more valuable to have micro-tests on every little thing, because there's no compile step to find typos or type conversion errors.
If you're working with something more static, like C#, Java, Go, Rust, etc, then it usually doesn't make sense to write tests like that. Better to stick to higher-level tests that ensure your classes are working together correctly. Except for the case of particularly tricky algorithms of course.
I also say, don't be scared of hundreds of tests breaking. Very likely there is only one actual bug. Pick one failing test, run that one alone, fix whatever is broken, and then run the suite again. Usually you'll have fixed most of them.
They're also dependent on what it is that you're building.
If you're writing an application, the value of unit tests is limited because the APIs internal to the application aren't being consumed by another developer but instead a user who's going to click/tap through an interface. So, in that case, it makes more sense to test the behavior expected to result from user interaction rather than test the return values of a given function, for instance, because it doesn't matter if a function does the right thing when the app is rendering a blank screen. You can test way more functionality at once by testing at the integration level or by writing "application" tests. Unit tests are best suited for two situations; either you've got a function where it's simply faster/easier to make sure it does the right thing by using a unit test, or you're writing an API that other developers will be consuming in their own applications.
> If you're working with something more static, like C#, Java, Go, Rust, etc, then it usually doesn't make sense to write tests like that.
I wouldn't trust myself to write code without unit testing it, nor would I trust you, nor would I trust anybody else. It is all too easy to make mistakes like off-by-one errors or other edge case misbehavior.
I'm not saying to not write any micro-tests at all, just that most of them are not needed - use your judgement of course. Don't bother writing tests for things that your type system and compiler already protect you from.
I do find that off-by-one errors are much harder to make, in both static and dynamic languages, when you use the collection looping and manipulation constructs that almost every language has now, instead of C-style triple-statement for loops.
I'm not the person you're replying to, but I don't think that's what they're saying.
When I write Python, I'm paranoid that someone will pass a string instead of an int to my function. Or that when I for-loop over my tuple of strings, I'll accidentally iterate over the characters in the (single) string of my (1-)tuple, instead of the many strings in a n-tuple (with n>1). So I write unit tests to cover these specific cases.
When I'm writing Java or Go, I write all the other tests, but not those ones. Because the type checker would catch those problems for me.
I've seen the opinion "static typing obviates the need for unit tests" expressed pretty frequently, and I can merely gape utter bafflement. I like static typing, but it's not magic pixie dust. Start with the fact that unit testing was widely popularized with Java.
You have literally no idea whether an untested line in a dynamic language could ever work. A line in Java that compiles but doesn't have test coverage still probably does what it says, and how badly you need that test coverage depends on the complexity of what it says.
I partially agree with the author, I think unit tests are great to ensure functions and methods are working correctly, and they have saved me a few times! However, here is my perspective from a hardware point of view:
I think teams can be to quick to "mock" away real hardware from functional tests, and this can lead to hard to track bugs with communication. For example, on a project I worked on there was a bluetooth device and phone app. The team created a simulated device inside the app which mocked all communication. Things worked ok... until they didn't! What they should have done from the beginning is create the mocks on a "real" device and keep the Bluetooth communication in integration and functional tests. Bluetooth is a notorious pain in the neck but this is true for any external device or service.
A mature project should have tests at multiple levels, from unit up to system level integration tests. The more integration you can provide developers the better, because then the surface area of untested hardware and software is smaller.
I indeed see people calling pretty much any tests, except e2e test unit-tests.
However as per which test types to use: the right answer is (as with everything) "it depends".
When doing testing the overarching problem is that the testing was well thought. Different types of testing will behave differently at different scales and different problems.
Instead developers oftentimes thing in terms of WHAT, not WHY. One can't just do the same thing for completely different projects and expect the same results.
Most of the time best testing is opportunistic - one that does what's easy and fast to achieve.
In some apps functional test will test mostly separate and shallow code paths, which will not lead to a lot of coupling.
With some languages a good type-system is already a 90% of unit test suite.
A big part of testing is also designing your production code to be testable. (TDD is good here because it forces you to think about it upfront). With a right architecture even e2e test can be great.
Easily. Sometimes more, depending on the complexity. I had a configuration file generator with 5x as many lines of code being testing code. The consequences of getting even one field wrong would be severe, if they made to production undetected. There was some hairy legacy logic that I had to emulate.
The thing is, you need to test all your corner cases. These are more important in unit tests.
If the code is simple then you will be able to get away with much less. Every branch you add increases the complexity a lot.
I've found the same. Unit tests easily have 3-5x the code that is being tested. But most of the code is very very straightforward with little complexity. I.e call this method with N different arguments covering edge cases border values illegal values and legal values.
I agree that unit tests become the end all be all for many developers. For web development, I have gotten into a good pattern, where we have some reasonable unit tests. Then on the api, we have Postman tests that run on CI using the Neuman npm. Then for e2e tests, we use GhostInspector, which can also run in your ci pipeline. We do two sets of tests in Ghost, a set that are pretty happy path, and then some really complex ones that run between different sites etc. The first set run during CI, and if they break, then the build is broken. The others, we don't cause the build to break, as they are a bit more brittle.
I only write black box tests that call the outer most entry point (often a request to an endpoint) and then check if the database contains the expected results or if I get the correct response.
Unit tests (true unit tests) are good for corner cases that are very hard, or impossible, to test in such situations.
I once lead a file system driver project. Most of the tests used the real file system. (They tested the driver and some glue code as a "unit.")
But, there were some error conditions (corner cases) that we just couldn't test via a real file system, so those were true unit tests that ran a particular piece of code in isolation.
> I only write black box tests that call the outer most entry point (often a request to an endpoint) and then check if the database contains the expected results or if I get the correct response.
This is the approach I used when I wrote some personal blog software a few months ago. (It was a learning project.) I was extremely happy with the results, but in this case, it was useful because this was mostly a server-side rendered web application: https://github.com/GWBasic/z3/tree/master/test
I find the combination between unit tests, component tests and contracts the best. Contracts allow you to test with tons of data you wouldn't have ever generated manually when enabled in component and e2e tests.
The contracts obliviate often the need to write extensive unit tests. As they are part of the interface, we can avoid thus to re-write many unit tests when the interfaces change during a refactoring.
Although I agree with his definitions, I meet (and work with) many people who vehemently disagree - especially on things like "unit tests should never use a database". I wish there was some authority who could establish useful working definitions once and for all.
I think it's a generally accepted rule rather than a hard and fast one. That's because a unit test is usually expected to run faster than other test types, and they're expected to be the most deterministic. Using an actual database for a unit test not only can slow things down because of setup, but it can cause state leakage(which may not be a problem for the app itself) between tests and introduces more possibilities for race conditions and random test failures since it relies on a separate process.
But I see no reason why a unit test has to be databaseless. Of course you can write a unit test that uses a database! If it makes sense to do it, then do it. There's no computer god dictating rules to us that we are not allowed to break. However, it probably doesn't make sense to do so. ;)
If a term is popular, people will use it in different ways. I don't see so much wrong with that, so long someone is using the terms usefully.
e.g. with unit test, there's surely more than one useful way to emphasise it:
I suppose if a test is small and doesn't talk to a database, etc., then it will be quick. -- I think that's one thing people mean when they say "unit tests": "tests that I can run quickly".
I think another sense of the word "unit" is more sophisticated, meaning "the system under test should be a coherent unit" or something, in the same way that some function/class/etc. should have a single responsibility.
By definition it's not a unit test if it uses a database. Unit test means to test an individual method or function, and specifically without testing anything else.
> I wish there was some authority
It's an interesting idea since we do have authorities for things like language specifications. In this case though I do think the definitions of different types of tests are pretty well established in our industry. [1]. You can find the same definitions at that source in literally thousands of other places around the web, and I would be very surprised to find someone with experience claiming that unit tests should hit the database. That's not to say tests should not use a database. But that's a different type of test.
Reality is fuzzier than you're making it out to be.
There's no such thing as not testing anything else -- even if you're testing a pure function in a functional language, you're still going to be pulling in behavior from the OS, from your runtime environment, etc... We arbitrarily draw lines around certain interactions and say that a single function can't call into a database or rest endpoint, but can interact with complicated systems built on multiple abstractions and frameworks that are baked into the OS/language.
Units are built out of units! Almost every useful unit test will encompass multiple smaller units of code.
I think a lot of harm in testing comes from people thinking too hard about the distinction between integration and unit tests instead of asking, "what is the actual behavior I want to test, and how can I test just that and nothing else?"
For some codebases, it might make sense to have a test (whether you want to call it integration or unit) that hits the database but mocks a large portion of the rest of the code. For example, if you're writing an app that's being translated into multiple languages, you probably don't want your full-stack integration tests to look at translated strings in each language, since your translation team is going to be changing those strings all the time and it's wrong behavior for a test to fail because your translation team fixed a typo.
That's a situation where it makes sense to have some kind of mock or special translation that sidesteps the issue and doesn't try to test the entire system exactly as a client would see it.
The unit/integration test distinction takes something that is fundamentally a continuum and turns it into a binary yes/no question. This leads to dogmatism, where people make bad testing choices purely because, "by definition, my integration test can't mock this 3rd-party service that obviously should be mocked."
In contrast, if you spend all your time thinking about tests purely in terms of refactorability, reliability, and coverage, you will often end up with good tests that catch a lot of bugs regardless of whether anyone else calls them integration tests, functional tests, or unit tests.
You are making things way more complicated than they need to be. If your test needs to hit the database just call it a functional or integration or end-to-end test. What's the problem with that? What's the need to call it a unit test? You even seem to make this point yourself:
> For some codebases, it might make sense to have a test (whether you want to call it integration or unit) that hits the database...
> There's no such thing as not testing anything else -- even if you're testing a pure function in a functional language, you're still going to be pulling in behavior from the OS, from your runtime environment, etc.
I don't see how this point adds to the conversation around automated testing and different types of testing. Sometimes we mock the database to make tests faster. And sometimes we even mock the OS. I just recently wrote a test that mocks the filesystem. And I didn't have to do any work to do that. There's already a package available that does it for me.
> Units are built out of units! Almost every useful unit test will encompass multiple smaller units of code.
This seems like another point that doesn't add to the conversation around how to write useful tests. It sounds like pedantry around definitions to me, but maybe I'm missing some nuanced point that you're trying to make.
> The unit/integration test distinction takes something that is fundamentally a continuum and turns it into a binary yes/no question.
That's a fair observation. Binary definitions are also extremely useful in some cases. Sometimes you need to know whether something is black or white and there is simply a cutoff point between the two. That's useful in all sorts of situations in life, including talking about tests.
> This leads to dogmatism, where people make bad testing choices purely because, "by definition, my integration test can't mock this 3rd-party service that obviously should be mocked."
That's not what leads to dogmatism. If the best test for the purpose needs to hit the database, just call it an integration test. What exactly is the problem?
> In contrast, if you spend all your time thinking about tests purely in terms of refactorability, reliability, and coverage...
Those are good factors to consider. So is the time it takes to run tests. If I can run my unit tests in a few seconds, I will use them often. My current end-to-end tests take well over 30 minutes to run. They obviously do not get run hundreds of times a day. So there is very useful conversation around what we want to include in our end-to-end tests, and more importantly what we want to exclude. And without definitions like "unit test" and "end-to-end" test, those conversations would be needlessly awkward and take longer. Hopefully no one is suggesting we drop "unit tests" in favor of a "the tests that run super fast and before code merge" category?
When someone on my team says "let's leave that out of the end-to-end tests and just write some unit tests to cover this lesser used feature" everyone on the team knows exactly what to do. It's not really that hard to develop a shared and useful testing vocabulary on a team, and that shared vocabulary can and should come from the widely used definitions already out there. If you're getting dogmatic about definitions on your team, the problem is not the definitions themselves.
Ive seen people define unit test as something that could use a database. I've seen others vehemently disagree with this.
"Unit" is not a particularly well defined thing.
I think this lack of agreement is part of the problem coz it makes it impossible to have a conversation with these terms.
For this reason I usually avoid talking about unit tests and try to use something a bit more specific (e.g. xUnit framework test) to highlight what I mean.
> "Unit" is not a particularly well defined thing.
Fair point. I think in this case I would stop focusing on definitions and focus on goals. Saying that one goal with our unit tests is for them to be lightening fast, so disk access including hitting the database is not allowed. Feel free to write a "unit test" that hits the database. But we are going to run that "unit test" along with our slower running tests that we call "functional tests".
I really don't get the position that a test that is potentially harder to write (because mocks) and catches fewer bugs (because mocks) but takes 0.1 seconds to run instead of 3 seconds is intrinsically "better".
It's not intrinsically better. It's just a different type of test. And which type of test you want to use and when is going to be based on your goals as an organization. 5000 unit tests times 3 seconds is 4 hours, isn't it? Compared to 9 minutes in the former case. That matters in some organizations.
Unit testing is a waste of time IMO, only functional tests are worth the effort. My entire experience with unit tests these days is fixing something because the implementation changed and the behaviour didn't. They're a waste of time.
There's no reason to be embarrassed. You might be a very learned individual of deep and wide experience, but the truth is still that you have not heard of most things. None of us has heard of most things.
But it's good to have a name for shunt because otherwise people mistakenly think using a pure mock is a good idea. (It isn't. Use a fake instead -- real code but with a fast in-memory data store instead of a real database server)
Huh? If fixing a bug breaks a functional test, then either the test was correct and you changed your expected system behavior and the test saved your bacon, or the test was a change detector test not a functional test.
Saying "remove functional years" is just blinding yourself. A unit test will never tell you if the class you are changing has any relationship to the actual bug reported in production.
It seems like OP has never worked on a large system, or doesn't know about fakes (not mentioned in the article) to run end-to-end functional tests quickly in memory.