What can we learn about the backdooring of xz
/liblzma
, using OpenSSF Security Scorecards and dependency-management-data?
CVE-2024-3094
This evening, it was announced by Andres Freund that there is backdoored code in xz
and liblzma
:
I accidentally found a security issue while benchmarking postgres changes.
If you run debian testing, unstable or some other more "bleeding edge" distribution, I strongly recommend upgrading ASAP.
This is absolutely a bad thing, and despite it being the long Easter weekend for a large amount of the world, I'm sure there will be a lot of folks looking into it.
This has been released under CVE-2024-3094, and is marked as Critical
, the highest level of impact.
As well as the above linked email thread which is a great deal of depth into the issue, Xe Iaso has also written up some information about affected systems.
Now, I'm not here to talk about the vulnerability itself, but what we can learn about it.
There are unfortunately quite a few cases in recent years of backdoored code entering the supply chain - quite too many to link here!
It's been suggested on GitHub that this is due to (the lack of) requirement of code review on the libraries in question:
And so it begins. Always knew one day a nightmare supply chain attack would originate from GitHub.
"from github"? this wasn't a random drive-by contribution
From a GitHub repository where there are no branch protections, devs pushing to the default branches without reviews. Yes, not "from GitHub" in this case, but there are other OSS projects where someone can just exploit a build workflow and backdoor it.
This has allowed the developer who's committed the changes (whether compromised technically or physically) to act on their own and push the changes without anyone else in the loop.
So what can we learn about this, aside from to not necessarily update to the latest version of a library when it lands?
Catching unreviewed changes upstream, using OpenSSF Security Scorecard
You may be asking, "how many other libraries do I depend on that don't perform code review", hoping that the answer to that question is a low number... but you already know the answer to that question, don't you? π
To understand whether a given repository would also be susceptible to this, we can take advantage of the excellent OpenSSF (Security) Scorecards that can automagically provide us insight into the supply chain security health of our dependencies.
For instance, when we run Scorecard against the xz
repo, we can see that the Code-Review
check receives a value of 0
(the lowest possible) due to:
found 29 unreviewed changesets out of 30 -- score normalized to 0
Additionally, the Branch-Protection
check has a score of 0
:
branch protection not enabled on development/release branches Warn: branch protection not enabled for branch 'master'
This is super useful to get an indication of the health of the repository, and tracks with the suggested reason for this CVE.
But how can we make this a little easier to query, for instance across many dependencies?
Understanding just how many of your dependences are affected
Of course, it wouldn't be a blog post from me without being able to tie this back to dependency-management-data, a project I've been working on to better understand dependency usage across organisation(s).
With dependency-management-data there is a first-class integration with Scorecard, allowing you to import Scorecard data (or generate it from the public API's known data).
From here, we can then query the SQLite database, allowing us to craft a query such as:
-- a slightly more complex query to show the full range of the data
select
s.platform,
s.organisation,
s.repo,
s.package_name,
s.version,
s.current_version,
package_type as package_manager,
-- as SBOMs don't make this available, default to an empty array
'[]' as dep_types,
-- as SBOMs don't make this available, default to an empty string
'' as package_file_path,
printf('%.2f', scorecard_codereview) as scorecard_codereview
from
sboms s
inner join dependency_health as h on s.package_name = h.package_name
and s.package_type = h.package_manager
where
-- Scoring is leveled instead of proportional to make the check more
-- predictable. If any bot-originated changes are unreviewed, 3 points are
-- deducted. If any human changes are unreviewed, 7 points are deducted if a
-- single change is unreviewed, and another 3 are deducted if multiple changes
-- are unreviewed.
-- Via https://github.com/ossf/scorecard/blob/c1066d9ac232e835ec0c22a255cdd46ec58dd2c7/docs/checks.md#code-review
scorecard_codereview < 3
union
select
r.platform,
r.organisation,
r.repo,
r.package_name,
r.version,
r.current_version,
r.package_manager,
r.dep_types,
r.package_file_path,
printf('%.2f', scorecard_codereview) as scorecard_codereview
from
renovate r
inner join dependency_health as h on r.package_name = h.package_name
and r.package_manager = h.package_manager
where
-- Scoring is leveled instead of proportional to make the check more
-- predictable. If any bot-originated changes are unreviewed, 3 points are
-- deducted. If any human changes are unreviewed, 7 points are deducted if a
-- single change is unreviewed, and another 3 are deducted if multiple changes
-- are unreviewed.
-- Via https://github.com/ossf/scorecard/blob/c1066d9ac232e835ec0c22a255cdd46ec58dd2c7/docs/checks.md#code-review
scorecard_codereview < 3
order by
scorecard_codereview desc;
We can see from the example data that ships with dependency-management-data that there are quite a few results π
Alternatively, we could look at Code-Review
and Branch-Protection
:
select
s.platform,
s.organisation,
s.repo,
s.package_name,
package_type as package_manager,
printf('%.2f', scorecard_codereview) as scorecard_codereview,
printf('%.2f', scorecard_branchprotection) as scorecard_branchprotection
from
sboms s
inner join dependency_health as h on s.package_name = h.package_name
and s.package_type = h.package_manager
where
scorecard_codereview < 3 or scorecard_branchprotection < 10
union
select
r.platform,
r.organisation,
r.repo,
r.package_name,
r.package_manager,
printf('%.2f', scorecard_codereview) as scorecard_codereview,
printf('%.2f', scorecard_branchprotection) as scorecard_branchprotection
from
renovate r
inner join dependency_health as h on r.package_name = h.package_name
and r.package_manager = h.package_manager
where
scorecard_codereview < 3 or scorecard_branchprotection < 10
order by
scorecard_codereview, scorecard_branchprotection desc;
Which can be seen shown on the example data here.
Or we could look at the number of dependencies (in each repo) that are affected by low scores:
select
s.platform,
s.organisation,
s.repo,
count(*)
from
sboms s
inner join dependency_health as h on s.package_name = h.package_name
and s.package_type = h.package_manager
where
scorecard_codereview < 3
or scorecard_branchprotection < 10
group by
s.platform,
s.organisation,
s.repo
union
select
r.platform,
r.organisation,
r.repo,
count(*)
from
renovate r
inner join dependency_health as h on r.package_name = h.package_name
and r.package_manager = h.package_manager
where
scorecard_codereview < 3
or scorecard_branchprotection < 10
group by
r.platform,
r.organisation,
r.repo
order by
count(*) desc
Which can be seen shown on the example data here.
(This will be skewed for repositories that have a high number of dependencies, such as those using npm
- we could further break this down to include a percentage of dependencies affected)
These hopefully give you some good insights into the different ways you could utilise having this data to better understand the health of your dependencies, and make you a little more concerned about the state of everything π₯