Dependency Management Data's Open Policy Agent support is now a whole lot more efficient
Over the last week - pretty much since working on how you can use Open Policy Agent and Dependency Management Data to query EndOfLife.date data for internal packages, I've been working on a significant rework of how dependency-management-data performs Policy evaluations.
I've just released this, as v0.102.0, and am very chuffed with the release because:
- it's now twice as fast (with room for improvement)
- we've got a much reduced memory overhead when processing Policies
- it provides a much better user experience (with progress bars)
- it pre-filters data based on what your policy is actually querying, so you don't have to do anything to take advantage of the improvements
- it allows you to even further pre-filter your data
- it adds our first non-default database indexes
- it uses the Write Ahead Log (WAL) to speed up performance for writing to the database, while using
dmd
- it will hopefully only get faster in the future
- this release also includes builtins which make querying EndOfLife.date much easier, with so much less boilerplate
- this also fixes an issue with
report policy-violations
that hadn't been surfaced before
So, how fast is it? Well, for a data set of ~1,000,000 distinct dependencies, processing policy violations used to take ~14m. After this change, it now takes ~7m ππΌπ¨
(Fun aside: at one point I had this down to ~2m, but it turns out that was a bug, and I shouldn't have been so impressed with myself π«£)
There's no doubt more improvements to make, and other things I can be doing in both the DMD codebase, and taking advantage of improvements to how we interface with OPA, but it's a great saving.
And that's before you even start pre-filtering the data yourself! By doing this, I've further brought the time down for Policies that only need to target a subset of the data set.