Azure DevOps report: How a bug caused ‘sqlite3 for Python’ to go missing from Linux images
Source :- hub.packtpub.com
How Azure DevOps team detected and fixed the issue
The Azure DevOps team upgraded the versions of Python, which were included in the Ubuntu 16.04 image with M151 payload. These versions of Python’s build scripts consider sqlite3 as an optional module, hence the builds were carried out successfully despite the missing sqlite3 module.
The issue was first reported by a user who received the M151 deployment containing the bug via the Azure Developer Community on May 20th. But the Azure support team escalated, only after receiving more reports during the M152 deployment on May 31st. The support team then proceed with the M153 deployment, after posting a workaround for the issue, as the M152 deployment would take at least 10 days. Further, due to an internal miscommunication, the support team didn’t start the M153 deployment to Ring 0 until June 13th.
To safeguard the production environment, Azure DevOps rolls out changes in a progressive and controlled manner via the ring model of deployments.
The team then resumed deployment to Ring 1 on June 17th and reached Ring 2 by June 20th. Finally, after a few failures, the team fully deployed the M153 deployment by June 26th.
Azure’s future workarounds to deliver timely fixes
The Azure team has set out plans to make improvements to their deployment and hotfix processes with an aim to deliver timely fixes. Their long term plan is to provide customers with the ability to choose to revert to the previous image as a quick workaround for issues introduced in new images. The detailed medium and short plans are as given below:
Add the ability to better compare what changed on the images to catch any unexpected discrepancies that our test suite might miss.
Increase the speed and reliability of deployment process.
Short term plans
Build a full CI Pipeline for image generation for verifying images daily.
Add test coverage for all modules in the Python standard library including sqlite3.
Improving the support team’s communication with the support team to escalate issues more quickly.
Add telemetry, so it would be possible to detect and diagnose issues more quickly.
Implement measures, which will enable reverting to prior image versions quickly and mitigate issues faster.