How Neo4j’s Graph database can remediate vulnerabilities

Debricked

Debricked, now part of OpenText, makes it easy to use great open source with minimal risk.

Published Apr 27, 2022

Software developers are using, publishing, and contributing more than ever to open source projects. For instance, in the past two years, the number of downloads of pypi packages has increased by a staggering 360%. Between 2019 and 2020, the number of new package versions released increased by over 60% up to over 700 000 new versions! This is a demonstration of the power that the open source community holds over modern software development.

With this enormous increase, it is more vital than ever to manage vulnerabilities to be able to handle, or even prevent, incidents such as Log4Shell. There are lots of ways to do this in an automated manner, but as we all know, just because it’s automated doesn’t mean that it’s quick.

Debricked, the company I work for, provides a tool that lets you find, prevent and fix open source vulnerabilities through automation. As a (former, due to recently being acquired by Micro Focus) startup, there are always things we'd like to make smoother than they currently are. Automated pull requests to fix vulnerabilities are one of those things.

Automated PR’s are, according to a friend of mine, “a feature that everyone wants, but no one really uses”. One reason for this could be that they are not as fast as one would like them to be and that they usually break your dependency tree. In search of a solution to our, honestly, snail-speed-like PR’s, I came across Neo4j and their graph databases and got a crazy idea. What if we could use Neo4j’s graph database capabilities to make our pull requests generate faster than you can say automated pull request?

Viewing Open Source as a Graph

Debricked is based in Malmö, located just across the street from one of the most capable graph-database companies in the world, Neo4j. Our neighbors provide an open source solution to store and query graph-structured data with powerful algorithms that are unfeasible to conventional relational databases. It turns out, Neo4j, alongside their query language Cypher is very good at finding safe versions of root dependencies that introduce transitive vulnerabilities.

Let's use an example where Neo4j's graph-structured data could be implemented. When installing an open source library from a package manager such as Pypi, Maven, or Npm, a lot of packages have transitive dependencies. These are other open-source packages that the package you are installing uses internally. This creates a tree structure (graph) of open source that runs the risk of bringing in unwanted vulnerabilities to your software.

For instance, Nightwatch version 0.9.21 is vulnerable to CVE-2021-28918 through its use of netmask version 1.0.6 as seen in the image below.

To resolve this, we can't simply update netmask to the safe version 2.0.1 as this would introduce breaking changes to the upstream dependency tree as pac-resolver version 2.0.0 specifically states that it is not compatible with the safe version of netmask. This problem propagates up the dependency tree and forces us to ask the question:

Which version of the root dependency must I use to remediate a transitive vulnerability?

For our example, we must find what version of Nightwatch to use so that netmask becomes safe. Unfortunately, package managers don't know the deep dependency graph, as this can vary depending on the package manager, configuration, and what other root dependencies you have installed. This is where the Neo4j graph database integration comes into play. In our example, our graph algorithms find that 1.6.2 is the minimum safe version to use of Nightwatch in under 50 milliseconds! We manage to get this speed and accuracy by replicating the behavior of each package manager into a graph query or procedure in Neo4j.

Blazing Fast Root-fixes and Pull Requests

Together with our mirror of most major package-managers and GitHub, we built a PubSub and Change Data Capture (CDC) system with celery that continuously creates and updates the full deep dependency graph of open source communities (package managers). You can read more in-depth about our solution in a master thesis written by two of our developers Carl Ternby and Viktor Petterson.

By using Neo4j, we can now deliver automated pull requests at the speed of light (ok, maybe not quite, but they often take less than a second) for customers using Npm. Soon, we will have the same capabilities for Gomod and Maven, and later this year all other languages and package manager Debricked supports. Try it out for yourself, it’s free!

So, what I’m trying to say is:

The power of open source is perhaps not underestimated, but sometimes overlooked and forgotten. This statement is true both in terms of the damage it can cause (I’m looking at you Log4j), but more so when it comes to the incredible solutions it creates, and the innovation it stimulates.

Resolving transitive vulnerabilities is tricky but our solution, powered by Neo4j, demonstrates that we are, despite how silly it may sound, stronger together. So what’s next? Maybe I should look into automatically detecting that we do not introduce any breaking changes in the open source functionality you use? Please comment with your thoughts!

Lars Larsson, PhD

Securing society with software // Field CTO @ Elastisys

Detecting that a dependency upgrade does not introduce errors is the point of automated tests, am I right? But what if test coverage is poor, and no useful test exists that exercises the dependency that you just upgraded? Perhaps you could also check that, and make that part of the pull request. Have the dependency upgrader bot say something like "I ran your test suite, and you should be all good" or "...and it seems like you need more tests to verify that this upgrade does not break anything".