• Bug#1064454: debian-policy: Restrict deb822 field names more (1/2)

    From Guillem Jover@1:229/2 to Niels Thykier on Sat Aug 17 01:10:01 2024
    XPost: linux.debian.bugs.dist, linux.debian.policy
    From: guillem@debian.org

    Hi!

    On Thu, 2024-02-22 at 12:10:06 +0100, Niels Thykier wrote:
    Package: debian-policy
    Severity: wishlist

    While the parser is technically correct given the Policy description above,
    I cannot see any case where this is a desirable outcome. Instead, I think we should classify this as a syntax error such that users are instructed to indent `${shlib:Depends}` token.

    Currently both the dpkg C and Perl parsers are extremely lax (even more
    than what the deb822(5) manual page or the Debian Policy permit! No matter
    the outcome of this discussion, I'm going to make both these parsers at
    least as strict as the current dpkg documentation.

    I've played with this a bit and got the attached patch, which would
    restrict field names to:

    ^[_]*[A-Za-z]+[A-Za-z0-9-]*[A-Za-z0-9]+$

    Which I think would at least cover all current fields in the Debian
    archive, including debconf templates. And requires somewhat well formed
    fields, such as not allowing:

    - starting or ending with «-» (for OpenPGP ASCII Armor protection
    at the start, and oddness at the end),
    - any special character at all,
    - short field names (of 1 character),
    - starting with a number (although perhaps this one is questionable,
    and it creates asymmetry)
    - using an underscore in anything not on the front of the field name.

    Perhaps the allowed syntax should even be defined by context, so that
    the initial underscores are only allowed for debconf template files.

    (This would be made strict for the Perl parsers for the build tools,
    strict for the C parser in charge of building packages or installing
    them anew, and lax with warnings for existing installations to avoid
    bricking potential current users with such field names.)

    But see below.

    A simple fix would be to forbid `$` at the start of the field. Though, I think at this point it would make sense to prune more special characters
    from the list like `%&{}[]()/\\<>|$` from anywhere in the field. Note that
    we definitely need to keep `_` as it is used in debconf template files for translations. Both single and double underscore is used here, though they always come first in the field name.

    The reason why I want a tighter bound is that currently, the following
    things are also considered valid field names:


    |foo(>=1:2.0),
    foo(>=2:2.0),
    ,foo(>=3:2.0)
    ,foo(>=4:2.0)[amd64]<!nodeb>
    )': We can make inverted crying smileys as field names!
    `Foo`: Now with markdown formatting of the field name!
    $(foo): Can we trick something into running shell commands?
    /etc/passwd: Maybe path traversal


    In all cases, I see no value to allowing these absurd field names and only potential downsides.

    I was pondering about this and while I'm in general a strong proponent
    of rather strict parsing, to avoid garbage in garbage out, that
    external users would otherwise have to then protect against and deal with.
    In this case I could see how making the parsing overly strict could stymie potential innovation, like when debconf started using «_» to mark some
    fields in a special way. I mean we can always go around and carve out new exceptions, but that requires way more effort, and then one needs to
    justify those new uses, and convince others, which might be hard (once
    the allowed format is very strict)! We also do not have visibility
    over privately used field names. And there's the problem that allowing
    new syntax might require one release cycle to be able to reintroduce
    it, if the previous dpkg/apt need to be able to support it.

    Perhaps, as you mention, a more lenient parser that just restricts field
    start (from at least anything that can easily appear in a dependency field,
    for example) would accomplish most of the benefit, w/o blocking potential future directions? Say something like the following untested regex:

    ^[_]*[A-Za-z0-9]+[!-9;-~]+$

    (Or perhaps if we are allowing bracket characters, then require them to
    be balanced, but that might be too much effort, if we are never going to
    make use of them.)

    Alternatively we could have hard and soft restriction, hard with
    something like the above regex producing errors, and soft like the
    first regex, which would only produce warnings (although that might
    still be enough of a deterrent that one would still need to update
    parsers anyway for new syntax).

    But in any case, this latter regex would not allow for the exact same extensibility that debconf used at the time to prefix the field with a
    special character (although it very much preserves syntax freedom at
    the end), which perhaps is an indicator, that we should simply restrict
    it all, and if a new syntax is required we can always introduce it later
    on?

    New syntax that comes to mind could be stuff like:

    Field-Name(something): value
    !Field: value

    Or things along these lines.

    But someone will eventually try to do this. We already received a bug report against debhelper long ago about being able to do path traversal via questionable package names. Largely, this was considered a non-issue because dpkg in its default configuration would abort and debhelper has a "run arbitrary code via the upstream build system" feature anyway.

    Do you have a reference on hand to that report? Because if dpkg fails
    I'd assume that might be a side effect from something else, and it
    might make sense to make sure this is handled explicitly.

    Thanks,
    Guillem

    From 9b284029b29b3f27cc478198f7de73dfed61cf6d Mon Sep 17 00:00:00 2001
    From: Guillem Jover <guillem@debian.org>
    Date: Fri, 16 Aug 2024 19:59:51 +0200

    [continued in next message]

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)