Getting started with Dependency Management Data
Note: This blog post has been replaced by the official getting started guide for dependency-management-data. I've kept a copy here for posterity, but it's worthwhile checking out the up-to-date docs now, as this will not be updated in the future.
This is a companion post to go alongside my talk writeup of my talk at DevOpsNotts July 2023 about the dependency-management-data (DMD) project
This is intended as a quick setup guide, rather than an exhaustive jump into what it is and how it works - if you'd like that, check out the talk writeup π
Want to know a bit more in-depth what it is and how it works? Check out the more-indepth writeup.
TL;DR extraordinaire
At a minimum, you need to:
- retrieve some data, for instance via renovate-graph
- note that you do not need to be already using Renovate to use this!
- create the SQLite database for dependency-management-data
- import the data
We can do this by running:
go install dmd.tanna.dev/cmd/dmd@latest
# produce some data that DMD can import, for instance via renovate-graph
npx @jamietanna/renovate-graph@latest --token $GITHUB_TOKEN your-org/repo another-org/repo
# or for GitLab
env RENOVATE_PLATFORM=gitlab npx @jamietanna/renovate-graph@latest --token $GITLAB_TOKEN your-org/repo another-org/nested/repo
# set up the database
dmd db init --db dmd.db
# import renovate-graph data
dmd import renovate --db dmd.db 'out/*.json'
# then you can start querying it
sqlite3 dmd.db 'select count(*) from renovate'
Retrieving the data
As noted above, we need to retrieve data to be imported into DMD. For dependencies, I'd recommend using renovate-graph, which uses Renovate as the engine for retrieving package data.
We can run the following:
# optional, allows renovate-graph to retrieve the `current_version` column, as well as populate the `renovate_updates` table
export RG_INCLUDE_UPDATES='true'
# produce some data that DMD can import, for instance via renovate-graph
npx @jamietanna/renovate-graph@latest --token $GITHUB_TOKEN jamietanna/jamietanna deepmap/oapi-codegen
# or for GitLab
env RENOVATE_PLATFORM=gitlab npx @jamietanna/renovate-graph@latest --token $GITLAB_TOKEN tanna.dev/serve jamietanna/tidied
If you are looking at AWS infrastructure, check out the README for endoflife-checker which explains in more details how to pull AWS data.
Creating the database and importing the data
Once renovate-graph
has executed, you'll see an out
directory with one file per repo.
First, we'll create the database:
# or any name, really
dmd db init --db dmd.db
Then, we need to import the data. Notice the quotes around the argument to avoid shell globbing
dmd import renovate --db dmd.db 'out/*.json'
Now our database is ready to go π
Generating missing data (optional)
This is an optional step, but for ecosystems like the Java, the full dependency tree may not be immediately available.
We can run the following to (try) to fill in the missing dependency tree:
# note that this can take several minutes depending on how many dependencies you have!
dmd db generate missing-data --db dmd.db
Generating advisories (optional)
This is an optional step, but allows us to get some more meaningful information about our dependencies.
We can run the following to set up our advisories:
# optionally fetch community-sourced custom advisories
dmd contrib download
# then generate advisories for all our packages
# note that this can take several minutes depending on how many dependencies you have!
dmd db generate advisories --db dmd.db
Running some queries
Now we've got the data available, we can start to query it.
It's recommended you find your SQLite browser of choice and try the following queries:
-- how many packages have been ingested via renovate-graph
select count(*) from renovate
-- how many pending package updates have been ingested via renovate-graph
select count(*) from renovate_updates
-- how many packages have been ingested via dependabot-graph
select count(*) from dependabot
-- what are your most popular 10 transitive Go dependencies?
select
distinct package_name,
count(*)
from
renovate,
json_each(dep_types) as dep_type
where
package_manager = 'gomod'
and dep_type.value = 'indirect'
group by
package_name
order by
count(*) DESC
limit 10;
And from the dmd
CLI, we can also run the following:
# if you've generated the advisories data
dmd report advisories --db dmd.db
dmd report mostPopularDockerImages --db dmd.db
dmd report mostPopularPackageManagers --db dmd.db
Example
Interested in seeing what it's like with some pre-baked data? The example project has a web app hosted on Fly.io that contains a lot of public repositories from GitHub and GitLab which can give you an idea based on some pre-seeded data.