Analyze deployment frequency and quality across your entire organization in Swarmia.
How often do you deploy code to production? How many of these production deploys cause unintended defects, as experienced by end-users? How long does it take to fix such defects?
Swarmia's DORA and deployment insights help you answer these questions and analyze their trends across your organization (learn more from this blog post).
Tracking DORA metrics and their trends per team helps you assess both the frequency and quality of deployments. It provides teams with a tool for lowering their batch size and understanding the impact of their deployments.
Getting started
Start by setting up applications and configuring the way Swarmia tracks production deployments.
Configuring deployments in Swarmia
What you can measure
Deployment insights lets you both measure your deployments and visualize their trends. Besides looking at the volume of deployments, you can track which deployments had defects. This helps you understand the quality aspect of deployments.
Deployment Frequency
Deployment frequency is used to indicate the performance of a software development team. Ultimately, deploys to production are what make any work visible all the way to the user – or what makes it possible for your work to deliver a business impact.
According to the authors of Accelerate, “elite” teams are able to deploy new code on-demand or multiple times per day, whereas the release frequency of high-performing teams is between once per day and once per week.
A low deployment frequency can be an indication of working with large batches, or signal other problems such as poor deployment infrastructure or lack of reliable automated tests.
Change Lead Time
Measures the time it takes for pull requests to go from the first commit to a production deployment. Helps you identify wait times and bottlenecks in your development process.
According to Accelerate, “elite” teams can go from a code committed to production in less than an hour, while for high-performing teams it takes between one day and one week.
High change lead time can indicate too large batch sizes, slow code review/QA, or long CI/CD wait times.
Time to Deploy
Measures the time it takes from pull request merge to deployment. Part of change lead time.
Useful for understanding, how much delay your current deployment process is causing in your deployment process.
Change Failure Rate
The exact definition of a change failure is up to you. As a rule of thumb, it should be an incident that must be remedied immediately instead of waiting until the next regular deployment. If a deployment introduces a bug that doesn't need an immediate reaction, it probably shouldn't be defined as a change failure.
Swarmia uses deployments as the basis for change failures. Deploys that fix other deploys (eg. a patch, hotfix, rollback, forward fix) mark the original deploy as failure.
We look at the number of such failed deployments and calculate the Change Failure Rate by comparing this to the number of total deployments.
You can use the Deployments API to mark a deployment as a fix for an earlier deployment. In addition to this, you can navigate to Deployments and mark a deployment as fix manually.
Mean Time To Recovery
Mean Time To Recovery (MTTR) is the average time it takes to address change failures. It helps teams understand how quickly they're able to resolve issues.
Time To Recovery (TTR) can be determined for each failure as the time between the original deploy, and the fix for the problem. TTR can be used to understand the impact of each change failure (how long did the problem last, and what was its impact to the customer).
How deployments are attributed to teams
The Authors column in the Deployments table lists the authors of the pull requests in each deployment. The Team and Author filters on the page are based on the authors. These are then aggregated to calculate DORA metrics per team in Metrics / DORA. This lets you see DORA metrics for every team in your organization.
Note that one deployment can be linked to multiple teams if it has multiple pull requests or an author who belongs to multiple teams.
About deployments and change failures
We define a deployment as any change that was applied to your production code. In other words, you should only include deployments that were processed successfully, resulting into some sort of change in your production application. This can also mean deploying a version that was live previously.
💡 Tip: Swarmia automatically generates deployments for some of your repositories. These applications and production environments can be configured further in Settings / Deployments.
Change failures are deployments that have defects that somehow impact the user of your production application. Some typical examples are regressions, downgraded performance, or other types of bugs that impact the user.
There's no one definition for failures, but we offer some guidelines. The purpose of measuring failures is to understand when a team is moving too fast, to act as a feedback signal to slow down. You want to proxy the impact to users, so focus on production changes that are affect them.
To capture change failures, Swarmia forms links between deployment events to understand when a deployment is seeking to fix a previous production deployment. These targeted deployments will be considered failures.
Automating change failures
Swarmia can automatically detect change failures from rollbacks or reverted pull requests. This is enabled by default for new applications and is the easiest way to get started.
In addition, you can use the Deployments API to mark a deployment as a fix for an earlier deployment. By integrating it into e.g. your forward-fixing process, you can maximize the quality of change failure data and get reliable insights on the quality of your engineering process.
Manually indicating change failures
Alternatively, you can mark a deployment as a fix manually in Deployments. We fetch previous deploys for quick access, or you can find deployments from the past 90 days by searching with the version.