Yep, I've been a professional programmer for about twelve years now working in a...

Sodman · on Sept 24, 2020

> err on the side of writing tests that will find bugs in the entire system

While I agree these tests are going to be closer to your real production system, they also usually take longer to write, and take way longer to run. I can run thousands of unit tests and still have them be an order of magnitude quicker than a full e2e test which requires spinning up a k8s cluster, for example.

I agree that both have their place, but I'd err on the side of writing tests that will run quicker where possible so that my build pipeline doesn't take forever (and eventually cost a fortune in compute power).

asdfasgasdgasdg · on Sept 24, 2020

So, I'm in the privileged position of having an extremely large cluster of machines dedicated to running my team's tests. Even with that, e2e tests do take longer to run than unit tests. All our product's thousands of presubmit tests (millions of test cases plus random testing) might take fifteen to twenty minutes to run. The post-submits take even longer.

On the other hand, fixing a bug that makes it to production and is not caught by tests takes longer still, and additionally hurts user trust. On balance, I can't recommend forgoing thorough end to end testing just to decrease test run time. Spend more money on resources to run your tests if you must, but don't skimp on end to end testing. There is no substitute.

Sodman · on Sept 24, 2020

I agree with you overall, in an ideal world we all have near unlimited compute power and even these "longer" e2e tests run in a few minutes, not hours. Unfortunately it's also not terribly uncommon to see these e2e suites take many hours, or sometimes even days to run.

I think the happy medium might be using a dependency-aware testing pipeline that only runs tests for areas affected by the current commit's changes. I've seen Bazel used in this kind of setup with some success, for example.

Ideally no bugs make it to production, but that's almost never the reality. Obviously it'll depend on your software's use-case and user demographics, but you could erode user-trust just as bad by having a trivial user-facing bug sit visible on your app/site for a few days or weeks because your CI/CD pipeline takes so long to run that it prevents you from pushing out timely releases.

asdfasgasdgasdg · on Sept 24, 2020

Yeah, another aspect here is optimizing tests properly. Of course, this also takes engineering time, but if you get your performance culture in the right place, hopefully test run costs can be commensurate to the amount of value you're delivering. That will allow you to never have a situation where your tests take inordinate amounts of time because you cannot afford resources to run them.

Everything is tradeoffs, of course. You have to decide what's right for your business. But I think it's rare that there is too much or too thorough of testing. I'm guessing most orgs err in the other direction.

aidenn0 · on Sept 24, 2020

Useful functional tests are IMO almost always easier to write than useful unit tests.

Unit tests are only easy to write for a small subset of units (e.g. it's easier to check if a sequence is sorted than to write an efficient sort)

Functional tests though are very intuitive

steve_taylor · on Sept 24, 2020

What’s the point of thousands of unit tests running quickly if they don’t give you any confidence that the system continues to meet acceptance criteria?

Sodman · on Sept 24, 2020

If my unit tests are testing all of my public interfaces with "Given input X, expect output Y" and they're all passing, that gives me pretty good confidence that the system is in good shape. Most of the time when you do uncover and fix a bug, there is a unit test that you can write which would have detected it, which can then be running for all of the future releases to prevent any regressions.

I'm not advocating for exclusively unit tests, e2e tests are absolutely still necessary to make sure everything plays nicely when you put it all together. I was simply disagreeing with the parent comment that you should lean more on e2e tests than unit tests. I prefer a ratio of unit-integration-e2e tests which is closer to the "Practical test pyramid" [0]

[0] https://martinfowler.com/articles/practical-test-pyramid.htm...

asdfasgasdgasdg · on Sept 24, 2020

> If my unit tests are testing all of my public interfaces with "Given input X, expect output Y" and they're all passing, that gives me pretty good confidence that the system is in good shape.

Unfortunately, in my experience, most production bugs look more like this:

    Module A: Returns X + 1
    Implementation
    int A(int x) { 
      if (x == 11) crash();
      return x + 1; 
    }
    Unit test:
    assert(A(1) == 2);
    assert(A(5) == 6);

    Production:
    A(11);

The problem compounds as the parameter space of your system grows. Unless your unit tests exercise the power set of your parameters and results for each module, there is a likelihood of missing bugs like this.

Sodman · on Sept 24, 2020

This is exactly my point though, you wouldn't catch this in an e2e test either unless you try with an "11". If you're explicitly trying the "11" in an e2e test, why not just do it in a unit test instead? Once you hit this bug once, you can add an "assert(A(11) == 12);" and move on with confidence. If you want to test this specific scenario in an extra e2e test, you could potentially be adding another 2+ mins to every other CI/CD run that ever happens on the project.

Any good unit tests should at a minimum test the min/max/expected cases, as well as any known special cases. If there are unknown special cases, you're probably not any more likely to find them in e2e tests than you are in unit tests.

asdfasgasdgasdg · on Sept 24, 2020

To be clear, I was trying to indicate that there would be an interaction between two modules that is not tested via mocks, but is common in production. That is the most common cause of bugs, and also the type that e2e tests tend to catch quite well.

jolux · on Sept 24, 2020

This is a great example of a case where property-based testing is a good idea.

exdsq · on Sept 24, 2020

Just run longer e2e tests nightly and have unit/integration tests in the normal CI checks.

Sodman · on Sept 24, 2020

My problem with this is that now every single production release is either pushed back by at least a day (multiple days if issues are found), OR there's code running in production that hasn't run through e2e tests yet.

Plus it adds a layer of complexity to your version control. I'm assuming you're running these e2e tests against a branch that's up to date with the latest master. What happens if I have 4 PRs I need to merge - they all might pass tests when run against the current master but as soon as I merge one, the other 3 haven't technically been tested with master anymore, so I'll have to re-run e2e, which pushes everything back by another day (per PR)?

asdfasgasdgasdg · on Sept 24, 2020

> is either pushed back by at least a day (multiple days if issues are found),

Which is better than issues making it to production -- presumably what would have happened if you didn't have the tests.

> OR there's code running in production that hasn't run through e2e tests yet.

This situation should never occur except if there's a production emergency causing immediate loss of revenue. Even then, the proper answer is usually to roll back to a known-good version.

> I'm assuming you're running these e2e tests against a branch that's up to date with the latest master. What happens if I have 4 PRs I need to merge - they all might pass tests when run against the current master but as soon as I merge one, the other 3 haven't technically been tested with master anymore, so I'll have to re-run e2e, which pushes everything back by another day (per PR)?

The way releases should work is that periodically you cut a release in a new release branch. After that, further changes to master don't affect the release. All the e2e tests need to run against the release branch before it can be deployed. It should be deployed immediately to your test realm and given time to soak. If bugs are found, minimal fixes should be made against master and then cherry-picked into the release branch, followed by a rerun of tests and redeployment to the test realm. You should aim for a zero-cherry-pick situation. Do this by enabling features and changes via configuration rather than via binary changes.

Once the release branch reaches the appropriate level of stability, it should be gradually rolled out, zone by zone, to production. Zones should have some soak time before further zones are touched. The response to production anomalies should be a roll back to the previous version.

This applies to big, complex services that are mission-critical. For small, less critical services, a daily/push-on-green type of approach might be fine. But for such services testing should be much easier and it shouldn't be a huge imposition to run all tests before each release.

Sodman · on Sept 24, 2020

My problem with the "release-branch" approach is mostly that your prod releases end up being big-bang multi-feature releases. Any bug in one feature can delay the entire release. More changes usually means more room for things to go wrong. It's also harder to debug issues when they do happen, and they will happen eventually, regardless of how thorough your test cycle is. Your developers may also have lost context if they time between writing the code and deploying it is so long (eg if this process forces you to monthly or quarterly releases).

> Once the release branch reaches the appropriate level of stability, it should be gradually rolled out, zone by zone, to production. Zones should have some soak time before further zones are touched. The response to production anomalies should be a roll back to the previous version.

I 100% agree with you on this. Canary releases, followed by gradual rollouts with the option to roll back are ideal.

> Do this by enabling features and changes via configuration rather than via binary changes.

If your CI/CD pipeline is fast enough that you can deploy on each commit to master (multiple times daily), then you can roll forward in the rare situations that rolling back isn't an option. Feature flag frameworks certainly have some useful features, but they add a layer of complexity to your deployments which feels like a band-aid to help get around slow deployments. Totally valid for eg an iOS app that takes 2 weeks for Apple to review, but if you're deploying to the web where you can push new releases whenever you want, I don't think the added complexity is worth it.

I think your thorough, managed release approach is absolutely appropriate when the consequences of failure are life-threatening, or heavy revenue losses or brand damage. But the worst case scenario for most of the software most of us work on are closer to "1% of users could potentially see an error message during a 5-60s window before we detect the increased error rate and roll back the canary release"