Hi all,
Below is the security review that I did of the tag2upload design.
I am not a neutral party, in the sense that I think tag2upload is a good
idea and should be deployed. However, I do these types of security
reviews professionally, and I tried to approach this review the same way
that I would approach a major work project that needed a security review
to ensure we weren't deploying something with security issues. I
encourage any Debian community member with security expertise to check my work; with security reviews, the more eyes, the better.
I will also post this review on my web site, probably later tonight if I
have time.
## Threat model
I evaluated both the existing source package upload architecture and the tag2upload architecture against the following threats:
- Someone not in the keyring uploads a malicious source package, possibly
via a sponsor.
- Someone in the keyring (either a Debian Developer or a Debian Maintainer
for a package) uploads a malicious source package but makes it appear
that the package was uploaded by someone else in the keyring.
- An attacker compromises the system a Debian uploader uses to build
source packages and uses that access to inject malicious code into a
source package.
- Someone with administrative access to the archive processing machinery
(DAK, the archive signing key, or similar infrastructure) uploads a
malicious source package.
- Someone with administrative access to the tag2upload server or its
signing key uploads a malicious source package.
- Someone with administrative access to Salsa uploads a malicious source
package.
In each case, I looked at prevention, detection, and tracing.
Neither the existing upload mechanism nor tag2upload attempt to prevent or detect (as opposed to trace) the upload of a malicious source package by someone in full possession of a key in the keyring, so this threat is not considered in this document, although tracing for this threat is
discussed briefly.
## Brief architecture summary
### tag2upload
tag2upload replaces the first step of this upload process with the
following:
1. The uploader pushes a signed tag in a specific format to Salsa. For
non-native packages, this may reference an upstream tree in the same
Git repository by commit ID, which will be used to create the `orig`
tar file if needed.
2. Salsa notifies a web hook on a secure project-maintained system that a
new tag of interest has been pushed.
3. That system (with internal privilege separation) retrieves the Git tag
and corresponding commit, verifies the signature and tag metadata, and
verifies that the signer is in the relevant keyring.
4. Inside a VM or schroot, that system retrieves the Git tree and upstream
source tree if applicable, constructs or retrieves the `orig` tar file,
and constructs the Debian source package and source package control
file. This VM or schroot in essence operates as a source package
buildd.
5. The tag2upload server adds control header fields specifying the Git
object ID and the identity string and fingerprint of the uploader,
signs the resulting source package control file, constructs an upload
changes control file, signs it, and creates and signs another Git tag
reflecting any additional Git commits that were required to put the
repository into a canonical format (the "dgit view").
6. The tag2upload server pushes the original Git tag, its referenced tree,
and the additional "dgit view" tag to the publicly-accessible
dgit-repos Git server as a permanent archive.
7. The tag2upload server uploads the signed source package to the normal
archive incoming queue.
Subsequent processing of the upload happens identically to the existing upload system.
## Analysis
#### tag2upload server
The new tag2upload server architecture introduces a new type of build sandboxing that is similar but not identical to buildds (source package construction requires sufficient network access to Salsa, for example,
while buildds can be cut off from the network completely) and new code
that has to parse untrusted input.
The sandboxing design of the tag2upload server does a good job of reducing that risk. Signatures are checked early, so only attackers able to create
a valid OpenPGP signature with a key in the keyring can attack the most security-sensitive part of the system. The signing key is isolated from
both the component that processes incoming requests from Salsa and the component that constructs the source package, only interacting with them
via a restricted protocol.
The best way to detect whether the tag2upload server has been compromised would be to independently verify its output via a reproducible source
package construction system that starts from the same inputs, namely a
signed Git tag on a Salsa repository. This could be as simple as an independent tag2upload server, or could involve auditing or independent reimplementation of the steps the tag2upload server performs.
We don't have reproducible source package builds today, so this is not a regression. We currently blindly trust whatever the uploader uploads, and
the tag2upload proposal does not make that risk worse, merely shifts it to central infrastructure. I therefore don't consider reproducible source
builds to be a security prerequisite for adoption of the tag2upload
proposal. It is, however, obvious follow-on work that would improve
detection of some classes of attacks.
#### Replacing the upstream tree
The attack: Construct a benign and malicious Git tree pair containing only the upstream source. Reference the benign tree in a source package and get that source package signed by a sponsor to trigger tag2upload processing. Race the tag2upload server by deleting the upstream tag and commit ID and then pushing the malicious Git repository as a new commit with the same commit ID.
The upstream tag name is present in the signed tag metadata, but since
that tag itself is not required to be signed, the attacker can move it at will. The upstream tag therefore provides no protection against this
attack apart from a small detection risk. Authentication of the upstream
tree comes only from the inclusion of its commit ID in the tag metadata.
I suspect (but am not certain) that this attack would normally be
prevented by the Salsa Git service. The benign tree already existed in the same repository with the referenced commit ID (presumed to be checked by
the sponsor during review), and even if references to that object are
deleted via branch deletion, I believe Git will reject the push of the malicious commit ID until the old objects have been garbage-collected.
This presumably will take long enough that the tag2upload process will
fail because the upstream commit is missing.
This attack could be done by someone with administrative access to Salsa,
and thus in a position to force an immediate garbage collection of the unreferenced objects so that the tree underlying the upstream commit ID
can be replaced. Administrative access to Salsa would also make it trivial
to win the race against the tag2upload server. This attack is less prone
to detection than moving the tag to a different Salsa repository.
There is a variation on this attack where the attacker deletes the Git tag and tree that it references, pushes a colliding tree, and then repushes
the Git tag. I believe this has essentially the same properties as the
above attack.
## Conclusions
I believe widespread adoption of tag2upload would represent a security improvement for Debian. The availability of a more secure source package construction system outweighs, in my opinion, the small additional risks
it would introduce. I do not believe it introduces any significant
security regressions.
Were tag2upload adopted, I would recommend some follow-on work:
- Verify that there are securely-archived backups of the dgit-repos Git
server, since they contain useful information for tracing any discovered
malicious packages.
On 2024-06-11 18:39:04, Russ Allbery wrote:...
- Someone in the keyring (either a Debian Developer or a Debian Maintainer
for a package) uploads a malicious source package but makes it appear
that the package was uploaded by someone else in the keyring.
Neither the existing upload mechanism nor tag2upload attempt to prevent or detect (as opposed to trace) the upload of a malicious source package by someone in full possession of a key in the keyring, so this threat is not considered in this document, although tracing for this threat is
discussed briefly.
I'm actually curious as to why that is treated as a separate
possibility, because if kind of overlaps with the second model ("someone uploads a malicious package appearing from someone else")...
For me, that case and the "xz-utils" case are actually quite pressing
matters
On 2024-06-11 18:39:04, Russ Allbery wrote:
Below is the security review that I did of the tag2upload design.
Hi Russ, and thank you so much for taking the time to do this excellent
work. It's really comforting to think that we have an actual
professional look at our stuff, and I think we should do this more often
and systematically. :)
I will also post this review on my web site, probably later tonight if
I have time.
I didn't find that post, btw... No big deal of course!
Neither the existing upload mechanism nor tag2upload attempt to prevent
or detect (as opposed to trace) the upload of a malicious source
package by someone in full possession of a key in the keyring, so this
threat is not considered in this document, although tracing for this
threat is discussed briefly.
I'm actually curious as to why that is treated as a separate
possibility, because if kind of overlaps with the second model ("someone uploads a malicious package appearing from someone else")...
For me, that case and the "xz-utils" case are actually quite pressing matters, they don't quite keep me up at night, but they're the kind of
threat models I do worry about and that we should address head on. But *maybe* this is not the right vector to address them, that said...
The sandboxing design of the tag2upload server does a good job of
reducing that risk. Signatures are checked early, so only attackers
able to create a valid OpenPGP signature with a key in the keyring can
attack the most security-sensitive part of the system. The signing key
is isolated from both the component that processes incoming requests
from Salsa and the component that constructs the source package, only
interacting with them via a restricted protocol.
This is more a question to the dgit people, but what kind of hardening
do we have on the tag2upload server? I think dak has the cryptographic
keys in a HSM (Hardware Security Module) to prevent a threat actor from grabbing those keys for offline attacks...
Is there something similar (HSM or YubiKey) on the tag2upload server? If
not, why?
We don't have reproducible source package builds today, so this is not
a regression. We currently blindly trust whatever the uploader uploads,
and the tag2upload proposal does not make that risk worse, merely
shifts it to central infrastructure. I therefore don't consider
reproducible source builds to be a security prerequisite for adoption
of the tag2upload proposal. It is, however, obvious follow-on work that
would improve detection of some classes of attacks.
Does tag2upload make reproducible source packages harder?
I suspect (but am not certain) that this attack would normally be
prevented by the Salsa Git service. The benign tree already existed in
the same repository with the referenced commit ID (presumed to be
checked by the sponsor during review), and even if references to that
object are deleted via branch deletion, I believe Git will reject the
push of the malicious commit ID until the old objects have been
garbage-collected. This presumably will take long enough that the
tag2upload process will fail because the upstream commit is missing.
Yeah, that's a reasonable assumption, but I believe those jobs are ran
on a schedule on GitLab servers, so an attacker could *time* their
attack just so, to make sure the old tree gets GC'd just in time. It's a
heck of a race to win though, especially, since you need to time it on
the other side as well...
I wonder if you've considered the "we need to revoke access to compromised/hostile developer" threat model. Right now, we have a
relatively centralized model here (modulo DAM, dak, debian-keyring), and we're introducing a new component... How does tag2upload manage keys and
does it introduce additional response time or issues when revoking
access to retiring or revoked developers?
Having uploads in Git brings a whole set of interesting properties and
tools we could leverage there as well, to ensure the integrity of the
dgit repository itself. When Tor transitioned from gitolite to GitLab,
one of the concerns was exactly that kind of problem space where we're
not sure we want to trust GitLab with our code. So I did a significant
amount of work researching Git integrity solutions, and my findings are documented here:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/gitlab#git-repository-integrity-solutions
The work the kernel.org folks have been doing about publishing a
transparency log for the kernel git repository might be particularly
relevant here.
Generally I reach the same conclusion, although I think there are real security problems with both the existing and the proposed tag2upload mechanism that we should all be aware of. It is acceptable to realize
that we cannot protect against all attacks with reasonable costs.
The decision on whether to adopt tag2upload should be made primarily on non-security grounds.
## Threat model
I evaluated both the existing source package upload architecture and the tag2upload architecture against the following threats:
- Someone not in the keyring uploads a malicious source package, possibly
via a sponsor.
- Someone in the keyring (either a Debian Developer or a Debian Maintainer
for a package) uploads a malicious source package but makes it appear
that the package was uploaded by someone else in the keyring.
- An attacker compromises the system a Debian uploader uses to build
source packages and uses that access to inject malicious code into a
source package.
- Someone with administrative access to the archive processing machinery
(DAK, the archive signing key, or similar infrastructure) uploads a
malicious source package.
- Someone with administrative access to the tag2upload server or its
signing key uploads a malicious source package.
- Someone with administrative access to Salsa uploads a malicious source
package.
### Git object collisions
The current Git repository format and wire protocols use SHA-1 hash
digests (and only SHA-1 hash digests) to identify objects in the Git repository. Git uses a SHA-1 hash function that has been
[hardened against the SHAttered attack on SHA-1](https://github.com/cr-marcstevens/sha1collisiondetection),
and therefore is probably not vulnerable to known collision attacks.
This analysis is relevant only for SHA-1-based Git repositories. Once
Salsa supports SHA-256 Git repositories, tag2upload could decline to act
on any repository that uses SHA-1 hash digests, making this entire section moot.
#### Replacing the upstream tree
The attack: Construct a benign and malicious Git tree pair containing only the upstream source. Reference the benign tree in a source package and get that source package signed by a sponsor to trigger tag2upload processing. Race the tag2upload server by deleting the upstream tag and commit ID and then pushing the malicious Git repository as a new commit with the same commit ID.
The upstream tag name is present in the signed tag metadata, but since
that tag itself is not required to be signed, the attacker can move it at will. The upstream tag therefore provides no protection against this
attack apart from a small detection risk. Authentication of the upstream
tree comes only from the inclusion of its commit ID in the tag metadata.
I suspect (but am not certain) that this attack would normally be
prevented by the Salsa Git service. The benign tree already existed in the same repository with the referenced commit ID (presumed to be checked by
the sponsor during review), and even if references to that object are
deleted via branch deletion, I believe Git will reject the push of the malicious commit ID until the old objects have been garbage-collected.
This presumably will take long enough that the tag2upload process will
fail because the upstream commit is missing.
This attack could be done by someone with administrative access to Salsa,
and thus in a position to force an immediate garbage collection of the unreferenced objects so that the tree underlying the upstream commit ID
can be replaced. Administrative access to Salsa would also make it trivial
to win the race against the tag2upload server. This attack is less prone
to detection than moving the tag to a different Salsa repository.
There is a variation on this attack where the attacker deletes the Git tag and tree that it references, pushes a colliding tree, and then repushes
the Git tag. I believe this has essentially the same properties as the
above attack.
### Second-preimage attacks
An attacker could attempt to take the signature from a tag2upload tag
pushed to an arbitrary repository on Salsa and apply that same signature
to a Git tag for a malicious package on Salsa with the same version
number, triggering the tag2upload process. However, this requires constructing a malicious repository with the same SHA-1 hash digest as the repository containing the original tag. This is a second-preimage attack
on SHA-1, which is believed to be currently infeasible. (Second-preimage attacks are believed to be currently infeasible even against MD5, which is
a much weaker hash function.) tag2upload therefore prevents this attack.
This same attack is could be tried against the existing upload mechanism
by attempting to reuse the signature of an upload changes control file published in the
[debian-devel-changes list archive](https://lists.debian.org/debian-devel-changes/).
The second-preimage resistance of the hash function used by the OpenPGP signature similarly prevents this attack.
Hi,
On 6/13/24 22:27, Simon Josefsson wrote:
Generally I reach the same conclusion, although I think there are real
security problems with both the existing and the proposed tag2upload
mechanism that we should all be aware of. It is acceptable to realize
that we cannot protect against all attacks with reasonable costs.
In that case it is kind of disingenuous to highlight the necessity of
this change by pointing at the xz-utils scenario.
Can this be substantiated? Using SHA1CD in Git does not necessarilyThis is quite a strong assertion, and it is up to you to prove it.
mean someone cannot manually create a Git repository with a colliding
git commit somewhere in the history that gets accepted by git, and
allows someone to replace actual file contents. That may be the case,
but I haven't seen any detailed analysis answering that.
simon@josefsson.org wrote:
Can this be substantiated? Using SHA1CD in Git does not necessarily
mean someone cannot manually create a Git repository with a colliding
git commit somewhere in the history that gets accepted by git, and
allows someone to replace actual file contents. That may be the case,
but I haven't seen any detailed analysis answering that.
This is quite a strong assertion, and it is up to you to prove it. The current consensus among cryptography experts is that SHA-1 is still
resistant to preimage attacks.
Generally I reach the same conclusion, although I think there are real security problems with both the existing and the proposed tag2upload mechanism that we should all be aware of.
It is acceptable to realize that we cannot protect against all attacks
with reasonable costs. That's why we need the ability to transparently
audit all steps, to detect them when they occur. Reversely: it would be unfortunate to say no to new functionality because the new functionality don't solve all possible problems. That just stalls progress.
## Threat model
I evaluated both the existing source package upload architecture and the
tag2upload architecture against the following threats:
- Someone not in the keyring uploads a malicious source package, possibly
via a sponsor.
- Someone in the keyring (either a Debian Developer or a Debian Maintainer >> for a package) uploads a malicious source package but makes it appear
that the package was uploaded by someone else in the keyring.
- An attacker compromises the system a Debian uploader uses to build
source packages and uses that access to inject malicious code into a
source package.
- Someone with administrative access to the archive processing machinery
(DAK, the archive signing key, or similar infrastructure) uploads a
malicious source package.
- Someone with administrative access to the tag2upload server or its
signing key uploads a malicious source package.
- Someone with administrative access to Salsa uploads a malicious source
package.
Having a threat model is great. I find the notion of "uploads a source package" is poorly defined here though.
What threat model of those (if any) cover the situation were someone in
the keyring uploads a (benign) source package and something on Debian's
side (e.g., design of tag2upload) enables an attacker to substitute some
part of the intended upload with something malicious?
### Git object collisions
The current Git repository format and wire protocols use SHA-1 hash
digests (and only SHA-1 hash digests) to identify objects in the Git
repository. Git uses a SHA-1 hash function that has been
[hardened against the SHAttered attack on SHA-1](https://github.com/cr-marcstevens/sha1collisiondetection),
and therefore is probably not vulnerable to known collision attacks.
Can this be substantiated? Using SHA1CD in Git does not necessarily
mean someone cannot manually create a Git repository with a colliding
git commit somewhere in the history that gets accepted by git, and
allows someone to replace actual file contents. That may be the case,
but I haven't seen any detailed analysis answering that.
This analysis is relevant only for SHA-1-based Git repositories. Once
Salsa supports SHA-256 Git repositories, tag2upload could decline to
act on any repository that uses SHA-1 hash digests, making this entire
section moot.
I don't think it will be as simple as that: the git SHA256 transition documents suggests to me that even signed tags may refer to both SHA256
and SHA1 commits:
https://git-scm.com/docs/hash-function-transition#_signed_tags
Thus tag2upload would need to require 1) SHA256 Git repository support,
AND 2) that git tags refer to a SHA256 commit id, AND 3) any git
submodules used also rely on SHA256 rather than SHA1.
#### Replacing the upstream tree
The attack: Construct a benign and malicious Git tree pair containing
only the upstream source. Reference the benign tree in a source package
and get that source package signed by a sponsor to trigger tag2upload
processing. Race the tag2upload server by deleting the upstream tag
and commit ID and then pushing the malicious Git repository as a new
commit with the same commit ID.
I think this is an important and realistic attack vector that we
shouldn't be vulnerable to.
The upstream tag name is present in the signed tag metadata, but since
that tag itself is not required to be signed, the attacker can move it
at will. The upstream tag therefore provides no protection against this
attack apart from a small detection risk. Authentication of the
upstream tree comes only from the inclusion of its commit ID in the tag
metadata.
Which is SHA1 currently, and thus vulnerable to a collision attack,
which are known to be possible.
This attack could be done by someone with administrative access to
Salsa, and thus in a position to force an immediate garbage collection
of the unreferenced objects so that the tree underlying the upstream
commit ID can be replaced. Administrative access to Salsa would also
make it trivial to win the race against the tag2upload server. This
attack is less prone to detection than moving the tag to a different
Salsa repository.
There is a variation on this attack where the attacker deletes the Git
tag and tree that it references, pushes a colliding tree, and then
repushes the Git tag. I believe this has essentially the same
properties as the above attack.
Is Salsa admin access necessary? Isn't 'Maintainer' access sufficient?
To me, the protection against this attack seems weak, and I don't really
see any strong protection against it.
Couldn't we require that the signed tag2upload tag comment contains a
hash of the upstream code, somehow?
### Second-preimage attacks
An attacker could attempt to take the signature from a tag2upload tag
pushed to an arbitrary repository on Salsa and apply that same
signature to a Git tag for a malicious package on Salsa with the same
version number, triggering the tag2upload process. However, this
requires constructing a malicious repository with the same SHA-1 hash
digest as the repository containing the original tag. This is a
second-preimage attack on SHA-1, which is believed to be currently
infeasible. (Second-preimage attacks are believed to be currently
infeasible even against MD5, which is a much weaker hash function.)
tag2upload therefore prevents this attack.
Is a second preimage really required to mount this attack? Consider if someone creates a collision for a good and a bad version, gets the good version signed by a sponsor, and then re-use that signature for the bad version.
"Russ" == Russ Allbery <rra@debian.org> writes:
Can this be substantiated? Using SHA1CD in Git does not necessarily
mean someone cannot manually create a Git repository with a colliding
git commit somewhere in the history that gets accepted by git, and
allows someone to replace actual file contents. That may be the case,
but I haven't seen any detailed analysis answering that.
This was a really interesting point that I didn't catch. Thank you! Let
me try to rephrase this in the form of an attack and see if this captures what you were getting at.
The attack: Using a pre-SHA-1DC version of Git, construct a benign and malicious pair of Git trees that diverge at some point by abusing the hash
of an object vulnerable to SHAttered.
Push the benign tree to Salsa, relying on Salsa not reverifying the
hashes of new objects with a hardened hash, or alternately have
already planted the benign tree in a Git repository imported into
Salsa before SHA-1DC was in use. Get that tree signed by a sponsor,
again relying on the sponsor's git client not revalidating object
hashes, and then follow the same attack pattern in either "Moving the
tag" or "Replacing the upstream tree." Rely on the tag2upload server
not reverifying the hashes of the Git tree when it pulls it to
construct the signed source package.
In essence, this attack exploits the fact that Git is lazy about
performing hashes and usually only does so when it has to. I'm not sure
this assumption is correct for Salsa in particular, but it's at least plausible. The trees used in this attack would fail git fsck, because the critical object would hash to a different value using SHA-1DC than it does with SHA-1, but it's not clear that git fsck is called at any of the
points that would detect this attack.
I believe this attack would be prevented by setting transfer.fsckObjects
to true in the Git configuration of the tag2upload worker and failing the operation if it detects anything. (Or, equivalently, calling git fsck
after git clone and failing on any detected problems.) I believe this
forces recomputation of the hashes of all received objects. The object
used in this attack would fail that hash recomputation because the
tag2upload server would use a version of Git that uses SHA-1DC. The cost
is a performance penalty on git clone, which would be trivial for most repositories but which might be noticable for particularly large Git
trees.
My personal opinion is that always setting transfer.fsckObjects to true is good practice anyway to catch more banal problems such as disk corruption
and memory bit flips, so while I'm not sure I would bother just for this attack, it might be a good idea on general principles.
I have had that settings in my .gitconfig for several years, and the
number of git repositories that fail to clone with it is not negligible.
I encourage people to enable it and experiment for themselves. Try https://git.savannah.gnu.org/git/coreutils.git for example. I hack
around it by adding a 'fclone' alias, and I still need to use it once in
a while.
[transfer]
fsckObjects = true
[alias]
fclone = clone -c "fetch.fsckObjects=false"
Maybe the number of repositories on Salsa with this problem is low, but
isn't the tag2upload design vulnerable to upstream git repositories
having this problem too?
Or, of course, find a way to disable the author/committer checks, which I suspect are most of the failures, and keep the object hash checks.
The alternative would be to add some sort of support for fsck.skipList,
but that seems like annoying and arguably unnecessary complexity that potentially reintroduces the same security problem via a different
route.
Hi all,
Below is the security review that I did of the tag2upload design.
I am not a neutral party, in the sense that I think tag2upload is a good
idea and should be deployed. However, I do these types of security
reviews professionally, and I tried to approach this review the same way
that I would approach a major work project that needed a security review
to ensure we weren't deploying something with security issues. I
encourage any Debian community member with security expertise to check my work; with security reviews, the more eyes, the better.
I will also post this review on my web site, probably later tonight if I
have time.
I appreciate the thought and effort that went into this review.
If I'm following your description correctly, the tag2upload "package" flow is:
developer --> salsa --> tag2upload --> ftp.upload.debian.org
machine --> dgit-repos
Is that right?
While it may not matter from a post attack detection security trace perspective, I think there are more routine trace activities that this complicates. A couple of examples are the signed by listing in the tracker.d.o news section for packages and who-uploads from devscripts.
While making package signing information less visible isn't directly a security issue, it does seem like a complication that makes it harder to
keep up with what's going on.
Would you consider these kind of indirect effects relevant from a
security analysis perspective or are they just non-security concerns
from your POV?
A tag2upload server compromise is fairly serious. A compromise of any of tag2upload, dak, or the buildds have roughly equally serious potential
impact on the archive as far as I can tell, although the details differ.
In all three cases, you need reproducible builds to reliably detect the compromise, although in the tag2upload case you only need reproducible
source builds for the specific set of source transformations that
tag2upload is willing to perform, which I believe is a much easier problem than the reproducible binary builds required to detect buildd or dak compromises. dak, uniquely, can meddle with either source *or* binary packages, but dak meddling with source packages will break the signatures
on those packages, so is somewhat easier to detect than dak meddling with binary packages.
Scott Kitterman <debian@kitterman.com> writes:
I appreciate the thought and effort that went into this review.
If I'm following your description correctly, the tag2upload "package" flow is:
developer --> salsa --> tag2upload --> ftp.upload.debian.org
machine --> dgit-repos
Is that right?
Yes, I think so.
While it may not matter from a post attack detection security trace
perspective, I think there are more routine trace activities that this
complicates. A couple of examples are the signed by listing in the
tracker.d.o news section for packages and who-uploads from devscripts.
While making package signing information less visible isn't directly a
security issue, it does seem like a complication that makes it harder to
keep up with what's going on.
Would you consider these kind of indirect effects relevant from a
security analysis perspective or are they just non-security concerns
from your POV?
I made the assumption that, if tag2upload were deployed, those tools would
be modified to pick up the signer information from the *.changes fields
where tag2upload puts it. That metadata is put into both the *.dsc and
the *.changes files.
As with the other parts of this proposed design, that does require
trusting tag2upload to do the authentication check properly, so a
compromised tag2upload server could write erroneous trace information and >therefore would not be detected by either of those tools.
A tag2upload server compromise is fairly serious. A compromise of any of >tag2upload, dak, or the buildds have roughly equally serious potential
impact on the archive as far as I can tell, although the details differ.
In all three cases, you need reproducible builds to reliably detect the >compromise, although in the tag2upload case you only need reproducible
source builds for the specific set of source transformations that
tag2upload is willing to perform, which I believe is a much easier problem >than the reproducible binary builds required to detect buildd or dak >compromises. dak, uniquely, can meddle with either source *or* binary >packages, but dak meddling with source packages will break the signatures
on those packages, so is somewhat easier to detect than dak meddling with >binary packages.
(This is assuming I'm not missing some security control in dak, which is >entirely possible because I've not done a comprehensive security review of >dak and am not certain of all the details of the architecture. If I'm >missing something, please do correct me!)
Yes and no. The difference is that currently, I can download the source package and verify it myself. Not just who signed it and with what key,
but that the signature verifies. I don't need to trust assurances from
any service.
From the perspective of Debian, the project, that's presumably not significant and can be accounted for by updating our tools. From the perspective of some Debian users, I'm less certain of the significance.
You talk a number of times about whether an attack is possible against
salsa. But especially when thinking about detection and tracing, I
think that things that are verified by signatures made with keys not
held by the system in question are harder to modify than things that can
be verified only so long as a system remains trusted.
Which is to say, especially in the moment when considering an incident, people are very bad about reasoning about whether views of a system are equivalent.
So, I consider the following to be useful to an attacker--to be threats
worth mitigating:
1) Attacker uploads malicious code to the archive.
2) Attacker possibly through a compromise of the dgit server and salsa changes the git view to be something harmless.
Such an attack can be detected by regularly verifying the archive
contents against git versions.
Still, my initial read of your analysis is that you discount attacks
like this more than makes sense to me.
Scott Kitterman <debian@kitterman.com> writes:
Yes and no. The difference is that currently, I can download the source package and verify it myself. Not just who signed it and with what key, but that the signature verifies. I don't need to trust assurances from
any service.
No, that's not quite true. You're still trusting assurances from the uploader's system. The uploader did not, in general, directly check the artifact whose signature you're verifying; they, and you, are trusting
that the source package construction was done correctly from their working tree.
There's been a lot of discussion of the implications of the xz backdoor
for source package construction, but one of the takeaways that I took from
it is to be even less sure of the security of the uploader systems that
are generating our source packages. Imagine if xz had been backdoored to, say, inject the installation of a malicious maintainer script into source packages constructed on that system. How long would it have been before
we noticed? The malicious code would have been signed by the uploader and all the signatures would verify without difficulty.
Certainly we would have noticed eventually. Probably we would have
noticed before the next stable release. But I'm not at all sure we would have noticed before a lot of Debian uploader systems were backdoored and potentially a lot of uploader keys were stolen depending on uploader key storage practices. And there are probably sneakier attacks that I haven't thought of.
From the perspective of Debian, the project, that's presumably not significant and can be accounted for by updating our tools. From the perspective of some Debian users, I'm less certain of the significance.
I think it would be hugely valuable to have something like a "dgit verification mode" where you can ask dgit, which already has all the
source package construction logic, to take a tag2uplod-generated source package, start from the tag object and signature, and reproduce that
source package and verify it. Except for the retrieval of the signed Git tag, in theory all of that could be done locally. I'm not sure how hard
that would be (this comes back to the question of how difficult it is to ensure that the tag2upload source construction algorithm is easily reproducible), but I think something like that would go a long way towards providing some really interesting security properties.
I agree that there's a risk that what the uploader thought they were uploading and what they actually uploaded are different, but that's independent of tag2upload or not.
I also agree there are tradeoffs on all this. In the particular case of source package construction, there's a tradeoff between doing it on a centralized, managed service with a known configuration that is internet exposed versus the variety of unknowns associated with individual
developer machines.
There are different risks for the end user. Currently dget uses
dscverify by default before unpacking a source package. I'm not an
expert at all, so I don't have any appreciate for the perceived risks
that led to that being the default (IIRC, it wasn't always). I am
assuming that wasn't random. I'm not sure how that would work in this
new paradigm.
I think it would be hugely valuable to have something like a "dgit
verification mode" where you can ask dgit, which already has all the
source package construction logic, to take a tag2uplod-generated source
package, start from the tag object and signature, and reproduce that
source package and verify it. Except for the retrieval of the signed
Git tag, in theory all of that could be done locally. I'm not sure how
hard that would be (this comes back to the question of how difficult it
is to ensure that the tag2upload source construction algorithm is
easily reproducible), but I think something like that would go a long
way towards providing some really interesting security properties.
I think this is much more manageable if you assume the whole world uses
git all the time for everything and git is the interface, but that's not reality.
Personally, I think the ability to interact with the archive to do the verification and not relying on things that are not the archive for the
code to verify is an important property of the existing system and I
don't think it's feasible to maintain it in a tag2upload world.
Scott Kitterman <debian@kitterman.com> writes:
I agree that there's a risk that what the uploader thought they were uploading and what they actually uploaded are different, but that's independent of tag2upload or not.
But it's not independent; tag2upload makes this story somewhat better. tag2upload is based on a signed Git tag and moves the source package construction off of the uploader's system onto more-secure project infrastructure. It therefore moves the uploader's signature closer to
their intent: it's a signature over the thing that they are more likely to have directly reviewed, not over a build artifact derived from their Git tree.
I also agree there are tradeoffs on all this. In the particular case of source package construction, there's a tradeoff between doing it on a centralized, managed service with a known configuration that is internet exposed versus the variety of unknowns associated with individual
developer machines.
Right, for some value of Internet-exposed that can be fairly restrictive.
There are different risks for the end user. Currently dget uses
dscverify by default before unpacking a source package. I'm not an
expert at all, so I don't have any appreciate for the perceived risks
that led to that being the default (IIRC, it wasn't always). I am
assuming that wasn't random. I'm not sure how that would work in this
new paradigm.
Well, it obviously still works (once it's aware of the tag2upload key) but the signature is by the entity that constructed the source package, so dscverify will trace the signature back to the tag2upload server and not
to the uploader's system.
I think it would be hugely valuable to have something like a "dgit
verification mode" where you can ask dgit, which already has all the
source package construction logic, to take a tag2uplod-generated source
package, start from the tag object and signature, and reproduce that
source package and verify it. Except for the retrieval of the signed
Git tag, in theory all of that could be done locally. I'm not sure how
hard that would be (this comes back to the question of how difficult it
is to ensure that the tag2upload source construction algorithm is
easily reproducible), but I think something like that would go a long
way towards providing some really interesting security properties.
I think this is much more manageable if you assume the whole world uses
git all the time for everything and git is the interface, but that's not reality.
I'm extremely confused. Of course you can assume that for any package
signed with tag2upload. tag2upload will only act on Git repositories and therefore anything that it has worked on necessarily had to use Git as the interface.
Maybe you thought I was implying that this dgit verification mode would
work with general source packages and not just tag2upload packages? No,
it cannot, because in the general case we have absolutely no idea how to
map a source package in the archive back to a Git tree. That's exactly
the problem that tag2upload is trying to solve. For non-tag2upload
packages, we still have to rely on the source package as the farthest back that we can trace the code without diverging into package-specific
analysis and diverging maintainer workflows.
Personally, I think the ability to interact with the archive to do the verification and not relying on things that are not the archive for the code to verify is an important property of the existing system and I
don't think it's feasible to maintain it in a tag2upload world.
Here too, I don't understand exactly what you are saying. All of the
source packages uploaded to the archive via tag2upload will verify. They have valid OpenPGP signatures. That OpenPGP signature traces the
provenance of the source package back to the entity that constructed the source package just as the source package signatures in the archive do
today.
The entity that does the source package construction has changed from the uploader's system to the tag2upload server, so tracing the package to a specific maintainer (absent a dgit verifiation mode) requires relying on
that server, *but* you also now have the option of doing additional work
to trace the provenance back further to a signed Git tree, something
that's not possible in the general case with source packages today. This
has various security trade offs as discussed above. But I do not agree
that it breaks the property that you claim it breaks; in both cases, you
can trace the source package back to the entity responsible for its construction.
There are different risks for the end user. Currently dget uses dscverify by default before unpacking a source package. I'm not an expert at all, so I don't have any appreciate for the perceived risks that led to that being the default (IIRC, it wasn't always). I am assuming that wasn't random. I'm not sure how that would work in this new paradigm.
I think it's just that I view a signature by a mechanized service as something different that a signature made by an actual person.
Technically you are correct, but I think it's fundamentally different.
I don't think the computer is responsible for anything. I think it has
to trace to a person if you want to talk about responsibility.
Scott Kitterman <debian@kitterman.com> writes:
I think it's just that I view a signature by a mechanized service as
something different that a signature made by an actual person.
Technically you are correct, but I think it's fundamentally different.
I don't think the computer is responsible for anything. I think it has
to trace to a person if you want to talk about responsibility.
Okay, thanks, I think this is the core of our disagreement. Let me sum up
my side, just to be very clear about what I think the disagreement is.
I don't believe that "a signature made by an actual person" is something
that exists in the real world. Humans do not sign things. We do not have
an OpenPGP implementation in our heads. Signatures are always made by >software, running on a possibly compromised computer, directed by humans.
Any link between the human and the signature is a point of possible
attack.
For the existing source package signatures, a simplified sequence looks
like this:
human --> (1) dpkg-buildpackage --> (2) debsign --> (3) archive
For tag2upload, a simplified sequence looks like:
human --> (1) Git --> (2) tag2upload --> (3) debsign --> (4) archive
In our current system, the source package signature can be traced back to >(2). In the tag2upload case, the source package signature can be traced
back to (3) using the existing techniques and, with more work and new >techniques, all the way back to (1).
In neither case can the source package signature be traced back to a
human, which is what I am arguing makes them similar. What we're arguing >about is which system has the better design (both security and otherwise)
for the pieces prior to (2) in the first case and (3)/(1) in the second
case.
Yes. I think that's the core of the disagreement. In my view, when I
type the passphrase for my key, I'm asserting responsibility for the
contents of what I'm signing. It doesn't mean it is correct or uncompromised, but I am taking responsibility for it.
"Russ" == Russ Allbery <rra@debian.org> writes:
Scott Kitterman <debian@kitterman.com> writes:
Yes. I think that's the core of the disagreement. In my view, when I
type the passphrase for my key, I'm asserting responsibility for the contents of what I'm signing. It doesn't mean it is correct or uncompromised, but I am taking responsibility for it.
Right. And I come from a culture that emphasized blameless postmortems
and systems design and a way of thinking about security review from a
similar perspective, which is that assigning responsibility is not in and
of itself a useful thing to do. Just because someone is responsible
doesn't mean that we're more secure. It may mean that you have someone
you can punish afterwards, but it's very questionable how much that helps with security, really.
Assigning responsibility is, in that model, only important to the degree
to which it will change people's actual behavior towards behavior that is more secure, either before or after the fact. If one assigns
responsibility for something that isn't realistically under their control,
or in a way that doesn't cause their behavior to change, the argument is
that nothing is truly accomplished from a security standpoint. It's an illusion of security without actual security.
One of my goals in doing security design is to try to reduce the degree to which humans are performing repetitive validation tasks because humans are not good at maintaining constant vigilance. We know this from a bunch of empircal studies on, for example, airport screening. If a human does a repetitive task with a very low rate of true positives, their attention
will fade and there will be a lot of false negatives. Asking humans to do this is a recipe for failure, and making the humans responsible for doing this correctly and threatening them with consequences for not doing it correctly only slightly decreases the risk of failure.
This is exactly why reproducible builds are so important: that involves finding a way for computers to do the sorts of repetitive validation tasks that computers are good at and that humans are very bad at.
I don't equate responsibility and blame. If I'm responsible for
something and it blows up, then that means I'm responsible to help clean
up the mess, regardless of if the thing that went wrong is my fault or
not.
Scott Kitterman <debian@kitterman.com> writes:
I don't equate responsibility and blame. If I'm responsible for
something and it blows up, then that means I'm responsible to help clean
up the mess, regardless of if the thing that went wrong is my fault or
not.
How is that type of responsibility not correctly represented by
tag2upload? tag2upload is taking responsibility for construction of the >source package from a Git tree. If that blows up, it's the responsibility
of the tag2upload maintainers to help clean up the mess. The maintainer
is declaring responsibility for the Git tree that they signed. If that
blows up, it's their responsibility to help clean up the mess.
On 2024-06-16 2 h 23 p.m., Russ Allbery wrote:
For the existing source package signatures, a simplified sequence looks like this:
human --> (1) dpkg-buildpackage --> (2) debsign --> (3) archive
For tag2upload, a simplified sequence looks like:
human --> (1) Git --> (2) tag2upload --> (3) debsign --> (4) archive
Please excuse my naiveté, but how do you actually know that your package "works" with the tag2upload workflow if you're not building anything
locally before pushing?
By "works", I mean, how have you tested it will build and will pass all
the proper pre-upload tests?
On my side, I tend to work on a Git tree and when I'm happy with it I
use sbuild to:
1. build the source and the binary packages (and thus run build tests)
2. run Lintian
3. run autopkgtests
Only if all of these steps seem OK will I consider signing and uploading
the resulting source package (and yes, in reality what I actually intend
to sign is the Git tree I worked on).
Implementation notwithstanding, I'd be more than happy to have a "git $something" replace my use of debsign and dput, but I am genuinely
curious to know why we would make it easier to upload something that
hasn't passed what I believe are important QA steps before uploading?
Andreas Tille already raised that point in another thread, but the
answer seems to have been that it's already possible. Incentivising such
a behavior doesn't sound positive to me.
Scott Kitterman <debian@kitterman.com> writes:
I agree that there's a risk that what the uploader thought they were
uploading and what they actually uploaded are different, but that's
independent of tag2upload or not.
But it's not independent; tag2upload makes this story somewhat better. tag2upload is based on a signed Git tag and moves the source package construction off of the uploader's system onto more-secure project infrastructure.
Successfully attacking ALL individual developers, with each own
individual security weaknesses, seems to me more costly than attacking a single known publicly run instance like tag2upload or Salsa.
I can see that, but that leads to what I view as a problem. The thing in the archive is signed by a machine, not the human who decided it should be uploaded.
Simon Josefsson <simon@josefsson.org> writes:
Successfully attacking ALL individual developers, with each own
individual security weaknesses, seems to me more costly than attacking a
single known publicly run instance like tag2upload or Salsa.
You only need to be able to sucessfully attack *one* developer in order
to cause significant damage.
The more popular that developers packages are, the more damage you can
do.
So the developer with the weakest security practises and most popular packages is probably a prime candidate.
I think it would be hugely valuable to have something like a "dgit verification mode" where you can ask dgit, which already has all the
source package construction logic, to take a tag2uplod-generated source package, start from the tag object and signature, and reproduce that
source package and verify it. Except for the retrieval of the signed Git tag, in theory all of that could be done locally. I'm not sure how hard
that would be (this comes back to the question of how difficult it is to ensure that the tag2upload source construction algorithm is easily reproducible), but I think something like that would go a long way towards providing some really interesting security properties.
[...] in the general case we have absolutely no idea how to
map a source package in the archive back to a Git tree. That's exactly
the problem that tag2upload is trying to solve. For non-tag2upload
packages, we still have to rely on the source package as the farthest back that we can trace the code without diverging into package-specific
analysis and diverging maintainer workflows.
On 2024-06-16 2 h 23 p.m., Russ Allbery wrote:
For the existing source package signatures, a simplified sequence looks
like this:
human --> (1) dpkg-buildpackage --> (2) debsign --> (3) archive
For tag2upload, a simplified sequence looks like:
human --> (1) Git --> (2) tag2upload --> (3) debsign --> (4)
archive
Please excuse my naiveté, but how do you actually know that your package "works" with the tag2upload workflow if you're not building anything
locally before pushing?
Russ Allbery <rra@debian.org> writes:
But it's not independent; tag2upload makes this story somewhat better.
tag2upload is based on a signed Git tag and moves the source package
construction off of the uploader's system onto more-secure project
infrastructure.
I've seen this notion repeated a couple of times now, and I don't think
it is a good argument nor that it is particulary relevant to tag2upload.
One problem is that the tag2upload design aggregates and amplify the consequences of one successful compromise. Thus the standards of
security of that single centralized machine has to be higher than the standards for security of developer machines, since there is a more
limited number of things each individual developer machine can do.
A more reasonable security comparison here is to compare the security of
the tag2upload design with the security of compromising ALL of the
Debian developers. Both goals leads to similar abilities for the
attacker.
You have to look at security consequences from the attacker point of
view, not the point of view of each (hopefully benign) user of the
system.
Successfully attacking ALL individual developers, with each own
individual security weaknesses, seems to me more costly than attacking a single known publicly run instance like tag2upload or Salsa.
Debian has to my knowledge not published any information how private
keys on centralized machines are protected and managed.
My perspective is that for all projects that rely on non-free software/hardware we have to assume that the output is compromised,
since it is impossible to do a complete audit of the components.
Sorry I was assuming that the web ui and the git repository were still consistent, but were inconsistent with what was uploaded to the
archive.
I.E.
t =0 attacker uploads something undesirable through tag2upload.
t=1 attacker convinces salsa and dgit to replace that git tree with a
tree that is not undesirable and that has the same hash (so signatures
still verify).
If you look at git, everything looks fine.
If you clone git or dgit, everything looks fine.
If you look at the archive you still see the undesirable code.
However, I think your comment about tag2upload not making this worse may still apply.
In particular, today it's obvious that I don't need to upload sources
that differ from what I push to salsa (if I do not use dgit).
The only way in which tag2upload really comes into play is that it
increases dgit use and increases the probability that people will assume looking at salsa|dgit is the same as looking at the archive.
And as we've discussed here, that may or may not be true.
Below is the security review that I did of the tag2upload design.
[...]
The existing upload architecture requires trusting the host used by the uploader to build the source package. If that host is compromised, an attacker could inject malicious code into the source package, either by modifying the upstream tar file (if signed upstream tar files are not
used) or by injecting it into the Debian package build system, maintainer scripts, or patches.
This attack is not equivalent to compromise of the uploader's OpenPGP key, which neither upload architecture defends against. Many Debian uploaders build source packages on less-trusted systems where they also build and
test binary packages, and then sign the source package from a more-trusted system or use a hardware key.
Russ Allbery:
This attack is not equivalent to compromise of the uploader's OpenPGP
key, which neither upload architecture defends against. Many Debian
uploaders build source packages on less-trusted systems where they also
build and test binary packages, and then sign the source package from a
more-trusted system or use a hardware key.
Is this really common practice that Debian uploaders sign (source)
packages they built on less-trusted systems?
And, if yes: Why wouldn't they do the equivalent with the sources in git (work on the less trusted system, transfer commits (git push/pull) to
the system with signing access and sign there, without review)?
HW42 <hw42@ipsumj.de> writes:
And, if yes: Why wouldn't they do the equivalent with the sources in
git (work on the less trusted system, transfer commits (git push/pull)
to the system with signing access and sign there, without review)?
They will, I assume. But tag2upload requires that the malicious code
that could be added during that process be pushed to Salsa or the
signature will not validate and tag2upload will fail. My contention is
that this makes detection of the attack easier.
Is this really common practice that Debian uploaders sign (source)
packages they built on less-trusted systems?
Building a source package is a lot more opaque and gives the attacker a
lot more room to hide. Adding malicious code to tar to inject something
into source packages is a lot quieter
Building a source package is a lot more opaque and gives the attacker a
lot more room to hide. Adding malicious code to tar to inject something into source packages is a lot quieter
How many packages have a pubkey for the orig file?
Perhaps we should encourage upstreams to sign more?
I am sure that's also what you meant, Salvo, I just find it quite
relevant to be explicit that it is the care that need a boost, not the
amount of signatures.
I am sure that's also what you meant, Salvo, I just find it quite
relevant to be explicit that it is the care that need a boost, not the amount of signatures.
Well if you manually check very carefully every line and then don't sign… it's
harder to discover it got modified.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 489 |
Nodes: | 16 (2 / 14) |
Uptime: | 31:07:23 |
Calls: | 9,666 |
Calls today: | 1 |
Files: | 13,716 |
Messages: | 6,168,853 |