Getting started with DORA metrics in Swarmia

Analyze your deployment frequency and quality in Swarmia.

How often do you deploy code to production? How many of these the production deploys cause unintended defects, as experienced by end-users? How long does it take to fix such defects?

Deployment insights helps you answer these questions that form the core DORA metrics (learn more from this blog post).

Tracking both the frequency and quality of deployments provides teams a tool for lowering their batch size and understanding the impact of their deployments.

Getting started

Start by setting up applications and configuring the way Swarmia tracks their deployments.

Configuring deployments in Swarmia

What you can measure

Deployment insights lets you both visualize and measure your deployments. Besides looking at the volume of deployments, you can track which deployments had defects. This helps you understand the quality aspect of deployments.

Deployment Frequency

Deployment frequency is used to indicate the performance of a software development team. Ultimately, deploys are what make any work visible all the way to the user – or what makes it possible for your work to deliver a business impact.

According to the authors of Accelerate, “elite” teams are able to deploy new code on-demand or multiple times per day, whereas the release frequency of high-performing teams is between once per day and once per week.

A low deployment frequency can be an indication of working with large batches, or signal other problems such as poor deployment infrastructure or lack of reliable automated tests.

Change Lead Time

Measures the time it takes for pull requests to go from the first commit to deployment. Helps you identify wait times and bottlenecks in your development process.

According to Accelerate, “elite” teams can go from a code committed to production in less than an hour, while for high-performing teams it takes between one day and one week.

High change lead time can indicate too large batch sizes, slow code review/QA, or long CI/CD wait times.

Time to Deploy

Measures the time it takes from pull request merge to deployment. Part of change lead time.

Useful for understanding, how much delay your current deployment process is causing in your deployment process.

Change Failure Rate

Swarmia uses deployments as the basis for change failures. Deploys that fix other deploys (eg. a patch, hotfix, rollback, forward fix) mark the original deploy as failure.

We look at the number of such failed deployments and calculate the Change Failure Rate by comparing this to the number of total deployments.

You can use the Deployments API to mark a deployment as a fix for an earlier deployment. In addition to this, you can navigate to Deployment Insights and mark a deployment as fix manually.

Mean Time To Recovery

Mean Time To Recovery (MTTR) is the average time it takes to address change failures. It helps teams understand how quickly they're able to resolve issues.

Time To Recovery (TTR) can be determined for each failure as the time between the original deploy, and the fix for the problem. TTR can be used to understand the impact of each change failure (how long did the problem last, and what was its impact to the customer).

How deployments are attributed to teams

The Authors column in the Deployment Insights table lists the authors of the pull requests in each deployment. The Team and Author filters on the page are based on the authors.

Note that one deployment can be linked to multiple teams if it has multiple pull requests or an author who belongs to multiple teams.

About deployments and change failures

We define a deployment as any change that was applied to your production code. In other words, you should only include deployments that were processed successfully, resulting into some sort of change in your production application. This can also mean deploying a version that was live previously.

💡 Tip: Swarmia automatically generates deployments for some of your repositories. These applications can be configured further in Settings / Deployments.

Change failures are deployments that have defects that somehow impact the user of your production application. Some typical examples are regressions, downgraded performance, or other types of bugs that impact the user.

There's no one definition for failures, but we offer some guidelines. The purpose of measuring failures is to understand when a team is moving too fast, to act as a feedback signal to slow down. You want to proxy the impact to users, so focus on production changes that are affect them.

To capture change failures, Swarmia forms links between deployment events to understand when a deployment is seeking to fix a previous production deployment. These targeted deployments will be considered failures.

Automating change failures

Swarmia can automatically detect change failures from rollbacks or reverted pull requests. This is enabled by default for new applications and is the easiest way to get started.

In addition, you can use the Deployments API to mark a deployment as a fix for an earlier deployment. By integrating it into e.g. your forward-fixing process, you can maximize the quality of change failure data and get reliable insights on the quality of your engineering process.

Manually indicating change failures

Alternatively, you can navigate to Deployment Insights and mark a deployment as a fix manually. We fetch previous deploys for quick access, or you can search with the deploy version.