Typically any time the topic of security and software packages, in my case typically Python packages, comes up someone seems to come up with the “helpful” suggestion of “Just Use X!”, where X is typically GPG but can be any of a wide range of signing technologies. Quite often the people suggesting it have latched onto signing packages as some sort of voodoo you can throw at the problem and magically get “security”.
The Boring Easy Part
Generation of package signatures is actually a pretty simple thing to implement assuming you're using good libraries or tools to do it with. There is a number of technologies from OpenPGP to NaCl to even commercially available certificates from companies like Verisign. Most people look at the ease of generating signatures, believe that's all it takes, and declare it done. You can see an example of this with PyPI/distutils. Here the GPG signatures are able to be generated and even uploaded to PyPI and yet nothing uses them, and even if they did they couldn't be trusted.
The Hard Part that (Almost) No One Thinks About
When attempting to verify a signed file you check the signature against a public key. If the signature matches that public key then everything is kosher. The question then becomes which public key, and therein lies the rub. If you do not have a well defined model of trust then all you've done is thrown cryptography at a problem in order to give the people involved the ability to say that their system has signature verification.
Let's Just Ask The Delivery Guy
One naive solution would be to simply upload the public key to the repository as is done with PyPI. Then clients can simply download the public key and verify it against the signature! However in this model if someone is able to send you a malicious package they are also likely able to send you a malicious key. Once they've sent you the malicious package and the malicious key your client will happily verify it, claim all is well, and install it. If you trust the repository to deliver to you the expected signing key and the package to be verified you've gained nothing and introduced complexity.
Linux Has Packaging Signing, Let's Steal Theirs
Another solution to solving this is to look at prior art and steal what they've done. The most oft pointed to example is various flavors of Linux. On the surface it looks like exactly what people want. When you install your Linux it comes with a public key baked into the image and anytime you install packages it verifies against that public key. You're no longer implicitly trusting the repository to tell you what key you should trust because you already know what key.
However, this has a number of issues too. The first issue that a repository like PyPI would have with this system is simply one of scale. Debian or Red Hat have a small pool of developers who are able to make new packages, making it easy to properly verify each person and sign their keys or otherwise give them access. PyPI allows anyone to sign up and make a release which makes verifying authors an unmanageable problem.
The second issue is that in addition to trusting the package authors, you are trusting the entire build chain involved in producing the package. You don't have to trust the repository which hands you the package, but trusting one machine up the chain isn't all that different. If that machine got compromised you could generate malicious packages that tools would blindly install. This system does have one major advantage in that you have mirror validation built in. Since the validation is based on the package, not on the mirror you downloaded from, as long as the signature is valid you know it came from the trusted build machine.
Everyone is Connected, The Web of Trust
At this point in the argument, someone will typically bring up the web of trust available in OpenPGP and say that it doesn't require trusting a single machine nor does it require a small team to verify each author. This is getting much closer to a solution that technically is almost a solution but it's lacking a very critical piece.
With the OpenPGP Web of Trust you're signing the identity of the author. What you're not doing, and what there is no method of doing to my knowledge, is gaining any assurance that the person whose identity you've verified has any right to sign the package you're verifying. This means that if you trust Bob because you want to use his “foo” package, you also trust him to sign for the “bar” package, even though that belongs to Alice. This brings us back to the original problem of trying to determine if the key we have is trusted for this package.
The other major problem with the Web of Trust is its user experience. If you start requiring every person who wants to release a package to participate in the Web of Trust you've now drastically reduced the number of people who are willing to publish packages out of confusion, laziness, or even ability. Further more it encourages people to sign keys in order to make their packages work and not because they actually have any reason to trust that person.
It Says Secure Right in the Name
Another suggestion I've seen is adopting a model similar to that of SSH, that is the first time you install a package you're prompted to accept the key and from then on out it will remember that and use that as the trusted key. As you might have guessed by now, this solution also has problems besides the obvious issue that people are vulnerable during the first install.
Packages on PyPI change hands, get deleted, or even have multiple authorized releasers. This means that a package might have different signing keys that people should trust, making the tooling even more complicated. So how does a person know that when they attempt to install something and they get an invalid signature warning that this one is “OK” to wipe but sometimes it's not? If they go to PyPI to find out then we are back again at trusting the repository implicitly. If they don't go to PyPI they are most likely to just hit whatever lets them install, training them to just do what it takes to install and ignore the warnings and prompts.
Finally this approach makes the assumption that that any particular developer will have a stable machine on which the trust database can be stored. This is often not true, especially in this day and age of ephemeral cloud servers where new machines are started with a blank slate all throughout the day.
What the Hell Are We Trusting Anyway?
The elephant in the room when talking about package signing is what exactly we are trusting. For a repository like PyPI where everything is a free for all, generally the only thing we can trust is that the person who made this release is (according to PyPI) allowed to make releases. An important part of that statement which is easy to ignore, is “according to PyPI”. Even if we wave our hands and give ourselves the perfect way to transmit trust, as long as PyPI is the authority over who owns a particular name then we must implicitly trust PyPI to tell us who is allowed to release which packages.
All this said, we have not addressed whether it is safe to install this package. I could register a malicious package called “hackme” and sign it using any of the above methods and if you install it, even with the valid signature, you have decided to accept the consequences of running my code. It's important to remember that the only thing any of these systems are able to verify is that the package you've fetched is the package you wanted, nothing more.
Everything is Terrible So What Do We Do?
Bluntly put, I don't know for sure. This isn't an already solved problem nor is it an easy to solve one. I believe that whatever solution that is chosen is going to have a lot of the problems listed above. My biggest hope is that we'll get a solution where the end user has the relationship with the source of trust and not the package author.