Debian discusses vendoring—again


The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider accepting the trial offer on the right. Thank you for visiting LWN.net!

By Jake Edge
January 13, 2021

The problems with "vendoring" in packages—bundling dependencies rather than getting them from other packages—seems to crop up frequently these days. We looked at Debian's concerns about packaging Kubernetes and its myriad of Go dependencies back in October. A more recent discussion in that distribution's community looks at another famously dependency-heavy ecosystem: JavaScript libraries from the npm repository. Even C-based ecosystems are not immune to the problem, as we saw with iproute2 and libbpf back in November; the discussion of vendoring seems likely to recur over the coming years.

Many application projects, particularly those written in languages like JavaScript, PHP, and Go, tend to have a rather large pile of dependencies. These projects typically simply download specific versions of the needed dependencies at build time. This works well for fast-moving projects using collections of fast-moving libraries and frameworks, but it works rather less well for traditional Linux distributions. So distribution projects have been trying to figure out how best to incorporate these types of applications.

This time around, Raphaël Hertzog raised the issue with regard to the Greenbone Security Assistant (gsa), which provides a web front-end to the OpenVAS vulnerability scanner (which is now known as Greenbone Vulnerability Management or gvm).

[...] the version currently in Debian no longer works with the latest gvm so we have to update it to the latest upstream release... but the latest upstream release has significant changes, in particular it now relies on yarn or npm from the node ecosystem to download all the node modules that it needs (and there are many of them, and there's no way that we will package them individually).

The Debian policy forbids download during the build so we can't run the upstream build system as is.

Hertzog suggested three possible solutions: collecting all of the dependencies into the Debian source package (though there would be problems creating the copyright file), moving the package to the contrib repository and adding a post-install step to download the dependencies, or removing gsa from Debian entirely. He is working on updating gsa as part of his work on Kali Linux, which is a Debian derivative that is focused on penetration testing and security auditing. Kali Linux does not have the same restrictions on downloading during builds that Debian has, so the Kali gsa package can simply use the upstream build process.

He would prefer to keep gsa in Debian, "but there's only so much busy-work that I'm willing to do to achieve this goal". He wondered if it made more sense for Debian to consider relaxing its requirements. But Jonas Smedegaard offered another possible approach: analyzing what packages are needed by gsa and then either using existing Debian packages for those dependencies or creating new ones for those that are not available. Hertzog was convinced that wouldn't be done, but Smedegaard said that the JavaScript team is already working on that process for multiple projects.

Hertzog ran the analysis script described on that page and pointed to the output from the package.json file of gsa. He said that it confirmed his belief that there are too many dependencies to package; "Even if you package everything, you will never ever have the right combination of version of the various packages."

To many, that list looks daunting at best, impossible at worst, but Smedegaard seemed unfazed, noting several reasons to believe that those dependencies can be handled. But Hertzog pointed out that the work is not of any real benefit, at least in his mind. He cannot justify spending lots of time packaging those npm modules (then maintaining them) for a single package "when said package was updated in Kali in a matter of hours". He thinks the distribution should focus its efforts elsewhere:

By trying to shoehorn node/go modules into Debian packages we are creating busy work with almost no value. We must go back to what is the value added by Debian and find ways to continue to provide this value while accepting the changed paradigm that some applications/ecosystems have embraced.

He said that Debian is failing to keep up with the paradigm change in these other ecosystems, which means that "many useful things" are not being packaged. Pirate Praveen agreed that there are useful things going unpackaged, but disagreed with Hertzog's approach of simply using the upstream download-and-build process. Praveen thinks that a mix of vendoring (bundling) for ultra-specific dependencies and creating packages for more generally useful modules is the right way forward. It comes down to distributions continuing to provide a particular service for their users:

All the current trends are making it easy for developers to ship code directly to users. Which encourages more isolation instead of collaboration between projects. It also makes it easy for shipping more proprietary code, duplication of security tracking or lack of it. Debian and other distributions have provided an important buffer between developers and users as we did not necessarily follow the priorities or choices of upstream developers exactly always.

One of the reasons Smedegaard felt that the dependencies for gsa could be handled via Debian packages is that gsa (and other large projects) tend to overspecify the versions required; in many cases, other versions (which might already be packaged for Debian) work just fine. But figuring that out is "a substantial amount of work", Josh Triplett said in a lengthy message. He cautioned against the "standard tangent" where complaints about the number of dependencies for these types of projects are aired.

[...] people will still use fine-grained packages and dependencies per the standard best-practices of those communities, no matter the number or content of mails in this thread suggesting otherwise. The extremes of "package for a one-line function" are not the primary issue here; not every fine-grained dependency is that small, and the issues raised in this mail still apply whether you have 200 dependencies or 600. So let's take it as a given that packages *will* have hundreds of library dependencies, and try to make that more feasible.

He said that disregarding a project's guidance on the versions for its dependencies is fraught, especially for dynamically typed languages where problems may only be detected at run time. For those ecosystems, the normal Debian practice of having only one version of a given library available may be getting in the way. Relaxing that requirement somewhat could be beneficial:

I'm not suggesting there should be 50 versions of a given library in the archive, but allowing 2-4 versions would greatly simplify packaging, and would allow such unification efforts to take place incrementally, via transitions *in the archive* and *in collaboration with upstream*, rather than *all at once before a new package can be uploaded*.

Triplett outlined the problems that developers encounter when trying to package a project of this sort. They can either try to make it work with the older libraries available in Debian, upgrade the libraries in Debian and fix all the resulting problems in every package that uses them, or simply bundle the required libraries. The first two are enormously difficult in most cases, so folks settle for bundling, which is undesirable but unavoidable:

Right now, Debian pushes back heavily on bundling, and *also* pushes back heavily on all of the things that would solve the problems with unbundled dependencies. That isn't sustainable. If we continue to push back on bundling, we need to improve our tools and processes and policies to make it feasible to maintain unbundled packages. Otherwise, we need to build tools and processes and policies around bundled dependencies. (Those processes could still include occasional requirements for unbundling, such as for security-sensitive libraries.)

Adrian Bunk is concerned with handling security problems in a world with multiple library versions. He said that these ecosystems seem to not be interested in supporting stable packages for three to five years, as needed by distributions such as Debian stable or Ubuntu LTS. More library proliferation (version-wise) just means more work for Debian when the inevitable CVE comes along, he said.

But Triplett said that he is not expecting there to be a lot of different library versions, but that at times it might make sense to have more than one:

I'm talking about packaging xyz 1.3.1 and 2.0.1, as separate xyz-1 and xyz-2 packages, and allowing the use of both in build dependencies. Then, a package using xyz-1 can work with upstream to migrate to xyz-2, and when we have no more packages in the archive using xyz-1 we can drop it.

That's different from requiring *exactly one* version of xyz, forcing all packages to transition immediately, and preventing people from uploading packages because they don't fork upstream and port to different versions of dependencies.

It seems safe to say that few minds were changed in the course of the discussion. Bunk and Triplett seemed to talk past each other a fair bit. And no one spoke up with some wild new solution to these problems. But the problems are not going to disappear anytime soon—or ever. Without some kind of shift, bundling will likely be the path of least resistance, at least until some hideous security problem has to be fixed in enough different packages that bundling is further restricted or prohibited. That would, of course, then require a different solution.

The approach currently being taken by Smedegaard, Praveen, and others to tease out the dependencies into their own packages has its attractions, but scalability and feasibility within a volunteer-driven organization like Debian are not among them. The size and scope of the open-source-creating community is vastly larger than Debian or any of its language-specific teams, so it should not come as a surprise that the distribution is not keeping up. Debian is hardly alone with this problem either, of course; it is a problem that the Linux distribution community will continue to grapple with.



Did you like this article? Please accept our trial subscription offer to be able to see more content like it and to participate in the discussion.

(Log in to post comments)