Measure the productivity impact of AI tools

Combine developer experience surveys, adoption metrics, and usage patterns to understand how AI coding tools play into your software organization’s productivity.

Everyone’s excited about the potential of generative AI in software development, and for good reason. Tools like GitHub Copilot, Cursor, Windsurf, Claude Code, and ChatGPT are changing how developers write code. Engineering leaders we work with are seeing real productivity gains, but they also ask an important question: “How do we actually measure the impact?”

It’s a natural question. As engineering leaders, we want to understand the return on investment (ROI) of any new tool or practice. We want evidence that our investments are paying off and insights into optimizing their usage. But measuring the productivity impact of AI tools isn’t straightforward:

Many teams lack a clear baseline to compare against.
Developers use a fragmented mix of tools, and it’s hard to track them all.
Gains in one metric can unintentionally hurt others.
Early adopters tend to be high performers, skewing results (self-selection bias).
Overreliance and inadequate reviews can reduce code understanding and increase tech debt.
There’s no single measure of productivity and, thus, no simple definition of ROI.

That said, there’s still lots you can do to better understand the impact of AI coding tools. In this guide, we walk through the different things to measure, their limitations, as well as actions you can take based on the results.

Prerequisites & Setup

General prerequisites

Individual tools

Enabling GitHub Copilot metrics

To see these metrics, you must grant Swarmia read access to GitHub Copilot Business. You’ll get access to data from this point onward (no backfill of historical data). View instructions

Enabling Cursor metrics

Create a Cursor admin API key and copy it to Settings / AI assistants in Swarmia. We fetch 365 days of historical activity data upon connection.

Using Swarmia

Track the adoption and usage of AI tools

Track the adoption and usage of GitHub Copilot or Cursor across all your teams over time at Metrics / AI assistants.

See how many developers have the AI assistants enabled versus how many are using them in any time period. This makes it easy to spot adoption trends, identify teams leading the charge, and find unused licenses that might need attention.

Understand how AI impacts your developer productivity

There’s no single measure of developer productivity, so don’t expect to find one magic number that tells you the impact of AI tools either. Sure, that’s something your CEO or board might ask for from you, but the real world is more complex than that. Be wary of claims like “this tool makes your developers 55% more productive” or “GenAI gives you annualized savings of $475,728”, since they’re usually based on narrow definitions, wild assumptions, or flawed statistics.

Instead, here are some dimensions to explore for a comprehensive view of AI coding tools' impact on your engineering organization. With the added nuance, you should be much better equipped to answer the “now what” questions and take action.

Speed: How does pull request cycle time and throughput change in a team when their AI adoption increases? What kind of differences can you see between teams with different levels of adoption? Pay attention also to the share of review time versus time in progress.
Batch size: Monitor pull request batch sizes to keep review quality high. AI assistants make it easy to generate big pull requests with lots of changes. In the worst case, some of the code that ends up in your product hasn’t been read by any human, neither the author nor the reviewer.
Collaboration: Use the work log to track activity patterns and the distribution of work. Is collaboration increasing, or are more people working alone? Watch out for knowledge silos that form when individual developers generate and ship code without proper team understanding.
Quality: Quality often suffers when teams focus solely on speed. Track the proportion of maintenance work in your investment balance and the change failure rate and mean time to recovery of your deployments to spot hidden technical debt.

Capture developer sentiment and experience with surveys and retrospectives

AI tools are not just for code generation, and not all developer time goes into writing code. Even if you’d create new code faster, your bottleneck could be in your product discovery, review process, or deployment pipeline. Hence, you might want to keep an eye on a wider array of tools for different uses, such as taking meeting notes, improving documentation, retrieving knowledge, analyzing technical debt, suggesting missing test cases, and identifying risky changes.

To get the richest view on how developers use AI tools, simply ask them directly. This has several benefits:

You can gauge the usage of any tools without the need to get telemetry data from each.
You can assess the perceived effects on speed and quality, which can be quite elusive when using only system metrics.
Engineers can add their comments, and you can analyze common patterns in them.

You can create a survey in just a couple of minutes. Swarmia offers a selection of built-in AI tool questions with links to system metrics to analyze them hand-in-hand.

Taking action

Spreading best practices

Allow teams to find their own path to effective AI tool usage. Some developers and tasks will benefit more than others, and that’s okay.

Make progress visible, while emphasizing learning rather than comparison. Ensure your teams have access to the AI assistant activity metrics and survey results. Without transparency, people who don’t use these tools might assume others aren’t using them either.
Spot early adopters. Maybe they could organize an internal knowledge-sharing session to demonstrate effective usage patterns?
Collect successes, failures, tips, and observations in a shared document, wiki, or Slack channel.
Set aside dedicated time to experiment with AI tools.
Regularly reassess your approach as AI capabilities evolve — what wasn’t effective six months ago might be transformative today.

Finding adoption bottlenecks

Invest in overcoming setup hurdles and make it easy to get started.

Identify teams with low AI assistant usage or unused licences and find out what’s blocking them from taking the tools into use. Is it perhaps a lack of knowledge, access, or time?
Revise your codebase documentation to make it easier for AI agents to understand.
Configure good defaults for AI tools in your development environments.

Agreeing on ways of working

When taking any new tool into use, there are going to be unintended side effects. Be prepared to spot and address them early, before they become real problems.

Establish clear guidelines for code review of AI-generated content.
Create policies around AI tool usage with sensitive code.
If you notice knowledge silos, set up a working agreement to avoid working alone on issues.
Discuss the topic in retrospectives. See our guide on running survey retrospectives.
Focus on team-level improvements rather than individual performance.