If you have code on GitHub, chances are that you’ve had a security vulnerability alert at some point. Since the feature launched, GitHub has sent more than 62 million security alerts for vulnerable dependencies.
Vulnerability alerts rely on two pieces of data: an inventory of all the software that your code depends on, and a curated list of known vulnerabilities in open-source code.
Any time you push a change to a dependency manifest file, GitHub has a job that parses those manifest files, and stores your dependency on those packages in the dependency graph. If you’re dependent on something that hasn’t been seen before, a background task runs to get more information about the package from the package registries themselves and adds it. We use the information from the package registries to establish the canonical repository that the package came from, and to help populate metadata like readmes, known versions, and the published licenses.
On GitHub Enterprise Server, this process works identically, except we don’t get any information from the public package registries in order to protect the privacy of the server and its code.
Beyond the dependency graph, we aggregate data from a number of sources and curate those to bring you actionable security alerts. GitHub brings in security vulnerability data from a number of sources, including the National Vulnerability Database (a service of the United States National Institute of Standards and Technology), maintainer security advisories from open-source maintainers, community datasources, and our partner WhiteSource.
Once we learn about a vulnerability, it passes through an advanced machine learning model that’s trained to recognize vulnerabilities which impact developers. This model rejects anything that isn’t related to an open-source toolchain. If the model accepts the vulnerability, a bot creates a pull request in a GitHub private repository for our team of curation experts to manually review.
GitHub curates vulnerabilities because CVEs (Common Vulnerability Entries) are often ambiguous about which open-source projects are impacted. This can be particularly challenging when multiple libraries with similar names exist, or when they’re a part of a larger toolkit. Depending on the kind of vulnerability, our curation team may follow-up with outside security researchers or maintainers about the impact assessment. This follow-up helps to confirm that an alert is warranted and to identify the exact packages that are impacted.
GitHub Enterprise Server customers get a slightly different experience. If an admin has enabled security vulnerability alerts through GitHub Connect, the server will download the latest curated list of vulnerabilities from GitHub.com over the private GitHub Connect channel on its next scheduled sync (about once per hour). If a new vulnerability exists, the server determines the impacted users and repositories before generating alerts directly.
Security vulnerabilities are a matter of public good. High-profile breaches impact the trustworthiness of the entire tech industry, so we publish a curated set of vulnerabilities on our GraphQL APIs for community projects and enterprise tools to use in custom workflows as necessary. Users can also browse the known vulnerabilities from public sources on the GitHub Advisory Database.
Despite advanced technology, security alerting is a human process driven by dedicated GitHubbers. Meet Rob (@rschultheis), one of the core members of our security team, and learn about his experiences at GitHub through a friendly Q&A:
Humphrey Dogart (German Shepherd) and Rob Schultheis (Software Engineer on the GitHub Security team)
How long have you been with GitHub?
How did you get into software security?
I’ve worked with open source software for most of my 20 year career in tech, and honestly for much of that time I didn’t pay much attention to security. When I started at GitHub I was given the opportunity to work on the first iteration of security alerts. It quickly became clear that having a high quality, open dataset was going to be a critical factor in the success of the feature. I dove into the task of curating that advisory dataset and found a whole side to the industry that was open for exploration, and I’ve stayed with it ever since!
What are the trickiest parts of vulnerability curation?
The hardest problem is probably confirming that our advisory data correctly identifies which version(s) of a package are vulnerable to a given advisory, and which version(s) first address it.
What was the most difficult security vulnerability you’ve had to publish?
One memorable vulnerability was CVE-2015-9284. This one was tough in several ways because it was a part of a popular library, it was also unpatched when it became fully public, and finally, it was published four years after the initial disclosure to maintainers. Even worse, all attempts to fix it had stalled.
We ended up proceeding to publish it and the community quickly responded and finally got the security issue patched.
What’s your favorite feel-good moment working in security?
Seeing tweets and other feedback thanking us is always wonderful. We do read them! And that goes the same for those critical of the feature or the way certain advisories were disclosed or published. Please keep them coming—they’re really valuable to us as we keep evolving our security offerings.
Since you work at home, can you introduce us to your furry officemate?
I live with a seven month old shepherd named Humphrey Dogart. His primary responsibilities are making sure I don’t spend all day on the computer, and he does a great job of that. I think we make a great team!