DevOps at the US Patent and Trademark Office
Discussions of DevOps in government are always popular because it is a tough subject. Few have successfully cracked the code, and, even if they have, it is a slow, uphill climb with unique challenges.
Consider the U.S. Patent and Trademark Office’s (US PTO) Fee Processing Next Generation (FPNG). It is effectively an e-commerce site run by the U.S. Government.
The US PTO is the agency in the United States that registers trademarks and awards patents. It is self-funded, meaning all of it services are paid by fees charged to its users. Anyone that wants to file for a patent, register a trademark, or conduct services related to that, comes to the FPNG. It processes $3.5B in payments annually, is used worldwide, and is subject to all of the compliance issues you would expect in a government system.
How does DevOps drive it? Simmons Lough is a tech lead at the FPNG and helps lead the DevOps transformation. He shared their experience during the All Day DevOps conference.
Simmons offered up his abbreviated definition of DevOps: “Frequent, quick software installs to production without shortcuts.”
He notes that, in government, installs to production occur about once per quarter. While recognizing that culture and collaboration are important, Simmons advocates that government needs to focus on Continuous Delivery. Too often the transition to DevOps is stuck with executives meeting to talk about DevOps and the need for cultural transformation.
Simmons presented a chart to illustrate what the U.S. government has been trying to do for the past decade (the left side) and where they need to get to now (right side). One difficulty is trying to do too much in each sprint. Smaller batches reduce risk, but it can be very difficult to get your goal for each sprint small enough.
Agile ad DevOps
A Typical Government Sprint
Additionally, the government process is burdened with paperwork and steps. Typically, at the end of a two-week sprint, the scrum master declares “we are done.” In government that doesn’t happen. Instead, the application development team fills out a form to have a change security tested. That goes into a queue. One or two weeks later someone takes the form, scans the systems, sends results, fixes are made. Then another form is filled out, then another form for code review, wait in line for someone to do that work. Well, you get the idea! Lots of lead time. The standard governmental process is anti-DevOps.
When you get to the final step – Production Readiness Review – there could be 30 groups in the room to give the green light. Then, and only then, you can finally go live. There is so much bureaucracy that pushes to production only happen quarterly.
The US PTO Sprint
Realizing this needed to change at the US PTO, they gave the authority and power to the application team. Instead of a bunch of third-party approvals, they can now approve them. To achieve this, they automated most of the gates and created a document that overrode the current SDLC. It took about 3 years, starting with a pilot, to get approval and implement this change.
How did the US PTO arrive at this decision?
Agile and the Definition of Done. Simmons explains that, often in government, apps teams don’t have a definition of “done”. They need to know when they are done in order to properly implement DevOps. This gets back to what we mentioned earlier – you need to work in small enough batches within the sprint. They need to be small enough that the customer is using them when you are done with the sprint.
Microservices. Breaking the application into business domains, for instance payments, carts, receipts, etc., allows teams to work more independently. This speeds up build times and automated tests. They use independent repos so each is releasable, testable, deployable, and has their own database schema. Each have independent pipelines – one per microservice. The key principle here is independence.
Automated Build Gates. This decision is all about maintaining code quality. It includes automating forms and third-party reviews, live code coverage, unit testing, blocker violations, and policy enforcement. It now takes about 8 minutes to check quality instead of weeks.