Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!
We spend hours on Instagram and YouTube and waste money on coffee and fast food, but wonât spend 30 minutes a day learning skills to boost our careers.
Master in DevOps, SRE, DevSecOps & MLOps!
Learn from Guru Rajesh Kumar and double your salary in just one year.
Source:- rhonabwy.com
Tier your tests
Not all tests are equal, and you donât want to treat them as such. I have found 3 tiers to be extremely effective, assuming youâre working on a system that can have some deep complexity to review.
The first tier: âsmoke testâ. Lots of projects have this concept and it is a good one to optimize for. This is where the 10 minute rule that SemaphoreCI is promoting is a fine rule of thumb. You want to verify as much of the most common paths of functionality that you can, within a time-boxed boundary â you pick the time. This is the same tier that I generally encourage for pull request feedback testing, and I try to include all unit testing into this tier, as well as some functional testing and even integration tests. If these tests donât pass, Iâve generally assumed that a pull request isnât acceptable for merge â which makes this tier a quality gate.
I also recommend you have a consistent and clearly documented means of letting a developer run all of these tests themselves, without having to resort to your CI system. The CI system provides a needed âsource of truthâ, but it behooves you to expose all the detail of what this tier does, and how it does it, so that you donât run into blocks where a developer canât reproduce the issue locally to debug and resolve it.
The second tier: âregression testsâ. Once you acknowledge that timely feedback is critical and pick a marker, you may start to find some testing scenarios (especially in larger systems) that take longer to validate than the time youâve allowed. Some youâll include in the first tier, where you can fit things to the time box youâve set â but the rest should live somewhere and get run at some point. These are often the corner cases, the regression tests, integration tests, upgrade tests, and so forth. In my experience, running these consistently (or even continuously) is valuable â so this is often the ânightly build & testâ sequence. This is the tier that starts âafter the factâ feedback, and as youâre building a CI system, you should consider how you want to handle it when something doesnât pass these tests.
If youâre doing continuous deployment to a web service then I recommend you have this entire set âpassâ prior to rolling out software from a staging environment to production. You can batch up commits that have been validated from the first tier, pull them all together, and then only promote to your live service once these have passed.
If youâre developing a service or library that someone else will install and use, then I recommend running these tests continuously on your master or release branch, and if any fail then consider what your process needs to accommodate: Do you want to freeze any additional commits until the tests are fixed? Do you want to revert the offending commits? Or do you open a bug that you consider a ârelease blockerâ that has to be resolved before your next release?
An aside here on âflakesâ. The reason I recommend running the second tier of tests on a continuous basis is to keep a close eye on an unfortunate reality of complex systems and testing: Flakey Tests. Flakes an invaluable for feedback, and often a complete pain to track down. These are the tests that âmostly passâ, but donât always return consistently. As you build into more asynchronous systems, these become more prevalent â from insufficient resources (such as CPU, Disk IO or Network IO on your test systems) to race conditions that only appear periodically. I recommend you take the feedback from this tier seriously, and collect information that allows you to identify flakey tests over time. Flakes can happen at any tier, and are the worst in first tier. When I find a flakey test in the first tier, I evaluate if it should âstop the whole trainâ â freeze the commits until itâs resolved, or if I should move it into a second tier and open a bug. Itâs up to you and your development process, but think about how you want to handle it and have a plan.
The third tier: âdeep regression, matrix and performance testsâ. You may not always have this tier, but if you have acceptance or release validation that takes an extended amount of time (such as over say a few hours) to complete, then consider shifting it back into another tier. This is also the tier where I tend to handle the (time consuming) and complex matrixes when they apply. In my experience, if youâre testing across some matrix or configurations (be that software or hardware), the resources are generally constrained and the testing scenarios head towards asymptotic in terms of time. As a rule of thumb, I donât include ârelease blockersâ in this tier â itâs more about thoroughly describing your code. Benchmarks, matrix validations that wouldnât block a release, and performance characterization all fall into this tier. I recommend if you have this tier that you run it prior to ever major release, and if resources allow on a recurring periodic basis to enable you to build trends of system characterizations.
Thereâs a good argument for âduration testingâ that sort of fits between the second and third tiers. If you have tests where you want to validate a system operating over an extended period of time, where you validate availability, recovery, and system resource consumption â like looking for memory leaks â then you might want to consider failing some of these tests as a release blocker. Iâve generally found that I can find memory leaks within a couple of hours, but if youâre working with a system that will deployed where intervention is prohibitively expensive, then you might want to consider duration tests to validate operation and even chaos-monkey style recover of services over longer periods of time. Slow memory leaks, pernicious deadlocks, and distributed system recovery failures are all types of tests that are immensely valuable, but take a long âwall clockâ time to run.
Reporting on your tiers
As you build our your continuous integration environment, take the time and plan and implement reporting for your tiers as well. The first tier is generally easiest â itâs what most CI systems do with annotating in pull requests. The second and third third take more time and resources. You want to watch for flakey tests, collecting and analyzing failures. More complex open source systems look towards their release process to help coral this information â OpenStack uses Zuul (unfortunately rather OpenStack specific, but theyâre apparently working on that), Kubernetes has Gubernator and TestGrid. When I was at Nebula, we invested in a service that collected and collated test results across all our tiers and reported not just the success/failure of the latest run but also a history of success failure to help us spot those flakes I mentioned above.