Analyzing pull request batch size

Batch size means how much work is bundled up in a change. Swarmia offers tools for evaluating pull request batch size by looking at the total amount of changes.

Pull requests can be measured in complexity in many ways. Analyzing at the amount of changes, or the absolute size of the diff, is the most common and robust way to do this.

While there are circumstances where very large pull requests are valid, it's recommended to break down the work to smaller increments.

The main benefit of doing so is an improved flow of work. This tends to happen because the team is changing things in smaller increments, which are quicker to author, and also easier to review. Reviews can be more thorough when there are less changes to analyze at one go, and this adds up to an increase in the quality of the code base. Furthermore, smaller changes help get the reviews more quickly than, say, for a 500+ line monster, which might be demoralizing to even start looking at.

Excluding auto-generated files

It's important to focus on only the actual changes when analyzing pull requests. However, repositories often contain generated files, which could be related to e.g. software dependencies or API documentation.

In order to provide an accurate view to the real changes, Swarmia automatically cleans up the total change count of some commonly used generated files.

We exclude any changes when the file name ends with any of the following strings:

  • package-lock.json
  • yarn.lock
  • Gemfile.lock
  • .snap

We also exclude any changes within folders "generated" (i.e. any path that contains "/generated/").

Reducing the amount of large pull requests

There's no absolute limit to a large pull request, but what matters most is the typical batch size. We recommend keeping close attention on the amount of PRs with more than 200 lines of changes.

help-center-pr-size-distribution

The pull request Batch Size Insights provide visibility to the typical batch size, and the batch size distribution. Teams can see how many large PRs are going through the system, and analyze these more closely.

By interacting with the distribution, you can analyze a section or sections in the distribution. Doing this filters both the scatter plot and the table containing detailed pull request information to the selected bucket(s).

help-center-pr-size-drillin

We recommend looking for patterns or similarities to help analyze what's driving a trend for a large batch size, or significantly lower cycle time. You may find that certain themes of work, some repositories, or specific features often result in bigger batches, or changes that take significantly more time and effort. 

See also: Diagnosing low pull request throughput