XPost: linux.debian.bugs.dist, linux.debian.policy
From:
guillem@debian.org
Hi!
On Thu, 2024-02-22 at 12:10:06 +0100, Niels Thykier wrote:
Package: debian-policy
Severity: wishlist
While the parser is technically correct given the Policy description above,
I cannot see any case where this is a desirable outcome. Instead, I think we should classify this as a syntax error such that users are instructed to indent `${shlib:Depends}` token.
Currently both the dpkg C and Perl parsers are extremely lax (even more
than what the deb822(5) manual page or the Debian Policy permit! No matter
the outcome of this discussion, I'm going to make both these parsers at
least as strict as the current dpkg documentation.
I've played with this a bit and got the attached patch, which would
restrict field names to:
^[_]*[A-Za-z]+[A-Za-z0-9-]*[A-Za-z0-9]+$
Which I think would at least cover all current fields in the Debian
archive, including debconf templates. And requires somewhat well formed
fields, such as not allowing:
- starting or ending with «-» (for OpenPGP ASCII Armor protection
at the start, and oddness at the end),
- any special character at all,
- short field names (of 1 character),
- starting with a number (although perhaps this one is questionable,
and it creates asymmetry)
- using an underscore in anything not on the front of the field name.
Perhaps the allowed syntax should even be defined by context, so that
the initial underscores are only allowed for debconf template files.
(This would be made strict for the Perl parsers for the build tools,
strict for the C parser in charge of building packages or installing
them anew, and lax with warnings for existing installations to avoid
bricking potential current users with such field names.)
But see below.
A simple fix would be to forbid `$` at the start of the field. Though, I think at this point it would make sense to prune more special characters
from the list like `%&{}[]()/\\<>|$` from anywhere in the field. Note that
we definitely need to keep `_` as it is used in debconf template files for translations. Both single and double underscore is used here, though they always come first in the field name.
The reason why I want a tighter bound is that currently, the following
things are also considered valid field names:
|foo(>=1:2.0),
foo(>=2:2.0),
,foo(>=3:2.0)
,foo(>=4:2.0)[amd64]<!nodeb>
)': We can make inverted crying smileys as field names!
`Foo`: Now with markdown formatting of the field name!
$(foo): Can we trick something into running shell commands?
/etc/passwd: Maybe path traversal
In all cases, I see no value to allowing these absurd field names and only potential downsides.
I was pondering about this and while I'm in general a strong proponent
of rather strict parsing, to avoid garbage in garbage out, that
external users would otherwise have to then protect against and deal with.
In this case I could see how making the parsing overly strict could stymie potential innovation, like when debconf started using «_» to mark some
fields in a special way. I mean we can always go around and carve out new exceptions, but that requires way more effort, and then one needs to
justify those new uses, and convince others, which might be hard (once
the allowed format is very strict)! We also do not have visibility
over privately used field names. And there's the problem that allowing
new syntax might require one release cycle to be able to reintroduce
it, if the previous dpkg/apt need to be able to support it.
Perhaps, as you mention, a more lenient parser that just restricts field
start (from at least anything that can easily appear in a dependency field,
for example) would accomplish most of the benefit, w/o blocking potential future directions? Say something like the following untested regex:
^[_]*[A-Za-z0-9]+[!-9;-~]+$
(Or perhaps if we are allowing bracket characters, then require them to
be balanced, but that might be too much effort, if we are never going to
make use of them.)
Alternatively we could have hard and soft restriction, hard with
something like the above regex producing errors, and soft like the
first regex, which would only produce warnings (although that might
still be enough of a deterrent that one would still need to update
parsers anyway for new syntax).
But in any case, this latter regex would not allow for the exact same extensibility that debconf used at the time to prefix the field with a
special character (although it very much preserves syntax freedom at
the end), which perhaps is an indicator, that we should simply restrict
it all, and if a new syntax is required we can always introduce it later
on?
New syntax that comes to mind could be stuff like:
Field-Name(something): value
!Field: value
Or things along these lines.
But someone will eventually try to do this. We already received a bug report against debhelper long ago about being able to do path traversal via questionable package names. Largely, this was considered a non-issue because dpkg in its default configuration would abort and debhelper has a "run arbitrary code via the upstream build system" feature anyway.
Do you have a reference on hand to that report? Because if dpkg fails
I'd assume that might be a side effect from something else, and it
might make sense to make sure this is handled explicitly.
Thanks,
Guillem
From 9b284029b29b3f27cc478198f7de73dfed61cf6d Mon Sep 17 00:00:00 2001
From: Guillem Jover <
guillem@debian.org>
Date: Fri, 16 Aug 2024 19:59:51 +0200
[continued in next message]
--- SoupGate-Win32 v1.05
* Origin: you cannot sedate... all the things you hate (1:229/2)