3 test design principles to get you to continuous integration
If your test case is causing more harm than good, is it truly useful? In the days of legacy software delivery, with long lead times and great difficulty changing the product once shipped, nearly all test cases (automated or not) were good test cases.
In this era of continuous delivery, though, this calculus has shifted. It’s shockingly easy to end up with a test that inadvertently causes your software to be less stable—whether that’s by building false confidence in the code, by removing trust from the tests themselves, or by taking up so much time that the tests aren’t run frequently enough.
Whether you’re doing automated or manual testing, it’s critical that any software checks that you make to validate your assumptions about code follow three key principles to ensure they are fully compatible with a continuous integration and delivery system.
[ Learn what your team needs to know to start taking advantage of test automation with TechBeacon’s Guide. Plus: Get the Buyer’s Guide For Software Test Automation Tools ]
Any test case that you’re going to run with any frequency must be reliable; that is, the test case cannot be flaky. Consider an automated check: In a continuous integration environment, this test case could run dozens or hundreds of times a day for a single team.
If a test is only 99% reliable (one false report in every 100 test executions), and you run it 200 times a day, then your team will be investigating false-positive failures at least twice daily. Multiply that by a unit test suite that can have tens of thousands of test cases, and the math becomes clear.
Any test case that is not at least 99.9% reliable should be removed until it can be brought above the reliability threshold.
But what does reliability look like? A test case must take every precaution to avoid a false negative or false positive. It must be repeatable without outside human intervention; it must clean up after itself.
In a fully automated system, there generally isn’t time for a human to, for example, drop tables on the SQL database every few test runs. Even a manually run test case must clean up after itself, because it is an unmanageable mental burden on the test executor to have a continuously shifting starting state.
Why is reliability so important? When developers must routinely waste time investigating false positives or negatives, they quickly lose faith in the automation solution and are likely to ignore real failures alongside the false ones.
In a continuous integration system, the most precious resource you have to spend is engineers’ time. Engineers have grown to expect results quickly, and they are unwilling to wait for something they perceive as wasting time. So ensure that you’re getting relevant results back as quickly as possible.
For example, there’s no point in attempting to run unit tests on code that doesn’t compile. And there’s no point in running an API-level integration test suite if the unit tests on an underlying package don’t pass. You’re assured that the code under test will have to change, so why waste time on a test run that is guaranteed to be thrown away?
Figure 1. In this modified testing pyramid, unit tests make up the foundation of your testing strategy, integration tests validate across boundaries, and specialty tests at the top capture any slow or complex testing. Source: Melissa Benua.
Always run the most important test cases as quickly as possible, and always run your fastest tests first. Those are nearly always your unit tests; a typical unit test executes in microseconds and can generally be run in parallel. In my continuous integration systems, I can usually process through tens of thousands of unit tests in around 90 seconds.
An integration test is a test that crosses boundaries, usually including at least an HTTP or other machine-to-machine boundary. By definition, these test cases execute in milliseconds and are several orders of magnitude slower than are unit tests.
Finally, a specialty test is anything that’s significantly slower than an integration test (such as an end-to-end automated UI test) or one that requires human intervention or interpretation that slows down the overall reporting of results.
While tests slower than a unit test certainly have value, and absolutely have a place in a continuous integration system, that place is after the faster and more reliable tests have run.
[ Understand quality-driven development with best practices from QA practitioners in TechBeacon’s Guide. Plus: Download the World Quality Report 2019-20 ]
A good test case should individually do as little as possible to produce a pass/fail outcome as quickly as possible. If you had infinite time in which to execute a test run, overlapping coverage and redundant tests wouldn’t be very important.
But if you only have a budget of five minutes for an entire integration test pass, for example, and every integration test case takes 10 milliseconds, then you have time to run only 30,000 test cases. And if you are doing a UI-based test that is closer to 1 second per test, then you have time for only 300 test cases.
Once you acknowledge the reality of an upper bound to the number of test cases you may run, you can consider exactly how to spend those resources.
Unit test explorer
Figure 2. Always name test cases in a clear, descriptive manner. As you can see above, that makes it easier to diagnose failures. Source: Melissa Benua.
Every test case should be a clear answer to a clear question, the combination of which adds up to a test suite that will give a clear answer about the full set of functionalities under test. Clearly named, atomic test cases should make it easy to locate the potential cause of a test failure, and also make it easy to understand at a glance what has and what has not been tested.
When test cases are clear and atomic, it becomes easy to find coverage overlaps, and thus candidates for deletion.
Now put those principles into practice
Test cases that have many complicated steps and validations are prone to failure (violating Principle 1) and to having a long runtime (violating Principle 2). Consider the following test case:
Ensure Safe Search=Strict Works
Create a new user
Log that user into the UI
Navigate to the user setting page
Change Safe Search setting to Strict
Navigate to the search page
Search for adult content
See that no results are returned
By running through everything end to end, you are inadvertently testing a lot of functionality in order to answer the question the test case is actually posing. Rather than running through the entire experience exactly as an end user would, this test case would be better served if you decompose it into several different cases—and several different suites:
User creation suite
Search suite; our test case goes here
Most likely all of A, B, and C will have a combination of 100x of unit tests and 10x of integration tests, depending on their specific system architecture boundaries. While C may require a logged in user, the purpose of the suite is not to ensure that you can create a user or update the user’s settings.
Creation and setting changes are incidental functionalities that are likely called during test suite setup; if they fail then, do not attempt to test any additional functionalities. Given this known order of precedence, you also want to ensure that test suites are run in A, B, C order, since you know that C is dependent on functionality in A and B. It’s useless to try to execute C if either A or B is known to not work.
Now apply the principles to improve stability
If as you’ve moved to continuous delivery your automated or manual tests are causing your software to be less stable, the steps above are for you. Follow these three key principles and your tests will always be compatible with your organization’s continuous delivery efforts.