Controlling Chaos: DevOps And The Digital Transformation
Recently, you will no doubt have read numerous predictions and forecasting from tech industry experts outlining their expectations of the next 12 months, with Digital Transformation (DX) often at the forefront. DX is in full swing and many organisations are moving towards a more agile approach and adopting DevOps principles.
Most enterprises have already begun shifting their strategies to align with their current environment; one which leverages the value of data more than that of physical assets, and which sees all industries increasingly powered by and reliant on digital information technology and processes.
The fact that businesses have only ‘begun’ to adapt and have not completely finalised this process is testament to the fluid and evolving nature of DX. Multiple IT technologies, processes, applications, systems and protocols need to be adopted and updated on a regular basis in order for businesses to keep abreast of changes. This does, of course, result in significant disruption for all involved.
The pace of digital service development–fuelled by increasing automation–is accelerating at a far greater rate than the IT operations team used to handle in the past. This acceleration creates more chaos in production environments.
Furthermore, the shift in the DevOps paradigm from delivering “failsafe” applications to expecting a “safe-to-fail” production environment increases this chaos even further. Close collaboration and communication between the IT team members responsible for service development and delivery is therefore vital in controlling, or at least minimising, the resulting chaos.
The DevOps Chaos Theory
In a business environment where the continuous delivery pipeline is spurred on by automation, and both the development velocity and enterprise scale are increasing, DevOps principles will prove more important than ever. It is useful to look at this at a more granular level, through the framework of the DevOps Chaos Theory.
The pace of innovation is measured as the Velocity (V) or the number of new software releases deployed in a production environment in a defined time period. The Scale (S) factor is measured as the overall number of IT staff involved in service delivery and management in production environments, such as DevOps, SecOps, QA, system architects, DBAs, NetOps, and help desk.
Interaction between these team members brings the potential for miscommunication, which will increase the overall chaos. The maximum number of interactions between these IT members is and for high scale organisations it approaches .
Based on these considerations, a logical hypothesis would identify the system-level Chaos (C) in production environments as . K is the normalisation factor that may change based on the overall adoption of DX in a specific industry and the effectiveness of collaboration and communication between the IT team members.
In such a disruptive environment it is vital that different departments within a business work openly together. While I appreciate the important role the automation tools play in the continuous delivery, they cannot eliminate the bottleneck in the pipeline but rather shift it down the line into production.
Therefore, according to the “safe-to-fail” paradigm, most chaos will likely manifest in production environments, so it will be crucial in the coming year and beyond that enterprises identify the level of constraint placed upon the IT operations team.
This will help to address what changes need to be made and what service performance management technology must be introduced, to prevent operations from becoming a bottleneck to the continuous service delivery cycle inherent to DX.
Effective management at a human level should form an important part of a company’s DX strategy, if chaos is to be mitigated and crisis averted. Operations and development teams need to collaborate to form and practice cross-company initiatives, in order to communicate and manage rapid systems changes.
An effective instrumentation and monitoring strategy are required to facilitate exactly that effective collaboration across IT teams. Since service delivery combines application and infrastructure into a single system, telemetry of key performance indicators (KPIs) of this system is critical.
Monitoring system-level KPIs requires access to reliable data sources, such as network traffic. An effective instrumentation of these data sources will play a key role in proactively identifying the root-cause of service issues and thus reining in chaos.
Enterprises must be able to effectively analyse the monitored data to gain insight into all the infrastructure subsystems and applications interdependencies to establish a comprehensive view of their services, accessing both real-time and historic information.
Continuous monitoring of all aspects of IT infrastructure, both virtual and physical, and applications will allow departments to carefully study developments and proactively identify current or potential future problems before they affect consumers.
Getting these things right and planning for next year will help maintain business velocity throughout the Digital Transformation process.