Written by Jamie Tanna
on August 10, 2023
CC-BY-NC-SA-4.0 Apache-2.0
3 mins

Analysing GitHub Pull Request review times with SQLite and Go

In Improving Team Efficiency By Measuring and Improving Code Review Cycle Time, I mentioned that one thing we can do to understand if code review is causing delays is to measure it.

Since then, I've also worked on building this at Deliveroo with one of my colleagues, just before we started using PluralSight Flow, which didn't quite give some of the metrics we wanted out of it.

With a bit of free time in my time between jobs, I thought I'd at least blogument the data fetching and parsing that I've found works, and so if anyone else goes to do this, they've got something to start with.

Unlike previous attempts I wanted to:

build it as an Open Source project
write it with Go
use SQLite as the underlying datasource
try building a CLI using urfave/cli/

The project can be found at gitlab.com/tanna.dev/ghprstats, which has some docs on how to get started with it.

I'm not sure I'm quite happy with how I've internally implemented it, but happy it's done and I can iterate over it.

How it works

At its core, we are using the following four APIs from GitHub to retrieve the data about a given Pull Request:

These are then used to look at who is interacting with the changes, as well as using the timeline events to determine the state(s) the PR is in over time.

We fetch the data up-front and then sync it to the SQLite database, after which it can be queried or more easily distributed.

What it looks like

For a few PRs worth of data, running ghprstats report cycle-time results in the following (converted to HTML table for ease of viewing):

PR	PR Title	Changes	Initial State	# Reviews	# Comments	First Reviewer	Time to first review	Time to close	Merged?	PR link
snarfed/oauth-dropins#284	Migrate IndieAuth to full authorization_code grant	+82,-36 lines changed across 3 files	ready	2	13	snarfed	4768	7150	true	https://github.com/snarfed/oauth-dropins/pull/284
cucumber/common#2024	Add further examples for valid Cucumber files to pretty-print	+108,-0 lines changed across 1 files	ready	0	6				false	https://github.com/cucumber/common/pull/2024
deepmap/oapi-codegen#648	Generate anonymous objects referenced in schemas	+1303,-50 lines changed across 35 files	draft	8	14				false	https://github.com/deepmap/oapi-codegen/pull/648
endoflife-date/endoflife.date#3330	Remove Gorilla toolkit archiving	+4,-3 lines changed across 1 files	ready	1	4	marcwrobel	187	188	true	https://github.com/endoflife-date/endoflife.date/pull/3330

With SQL access to the raw data, it may also be easier to run queries like the below to better understand the data.

For instance, how can we work out the most frequent approvers?

select
  login,
  count(author_id) as num_reviews
from
  reviews
  inner join users on author_id = users.id
where state = 'APPROVED'
group by
  author_id
order by
  num_reviews desc
limit 10

Or how can we find out which PRs started as a draft vs ready for review?

select
  pulls.owner,
  pulls.repo,
  pulls.id,
  title,
  (
    case
      when i.initial_state is null then 'ready'
      else i.initial_state
    end
  ) as initial_state
from
  pulls
  left join (
    select
      (
        case
          when event = 'ready_for_review' then 'draft'
          else 'ready'
        end
      ) as initial_state,
      timeline.created_at,
      timeline.owner,
      timeline.repo,
      pull_id
    from
      timeline
    where
      (
        event = 'ready_for_review'
        or event = 'convert_to_draft'
      )
    order by
      created_at asc
    LIMIT
      1
  ) i on pulls.owner = i.owner
  and pulls.repo = i.repo
  and pulls.id = i.pull_id;

Written by Jamie Tanna on Thu, 10 Aug 2023 20:20:00+01:00, and last updated on Thu, 10 Aug 2023 20:20:00+01:00.

Content for this article is shared under the terms of the Creative Commons Attribution Non Commercial Share Alike 4.0 International, and code is shared under the Apache License 2.0.

#blogumentation #sqlite #go.

This post was filed under articles.

Interactions with this post

Below you can find the interactions that this page has had using WebMention.

Have you written a response to this post? Let me know the URL:

Do you not have a website set up with WebMention capabilities? You can use Comment Parade.

← → Top