XPost: linux.debian.bugs.dist
From:
guillem@debian.org
Hi!
On Tue, 2024-12-10 at 15:13:03 +0100, Raphaël Hertzog wrote:
Package: dpkg-dev
Version: 1.22.11
Severity: normal
X-Debbugs-Cc: hertzog@debian.org
While maintaining tracker.debian.org, I started to get failures about
invalid "Maintainer" fields.
There's a clear violation with a Maintainer field with two maintainers: https://bugs.debian.org/1076048
Maintainer: Steve Langasek <vorlon@debian.org>, Michael Vogt <michael.vogt@ubuntu.com>
But we also have many cases where there's a trailing comma:
Maintainer: Debian Security Team <team@security.debian.org>,
Maintainer: Daniel Baumann <daniel.baumann@progress-linux.org>,
And yet nothing complained about this (neither dpkg, nor lintian, nor
dak). dpkg-source and dpkg-gencontrol happily copied the invalid data.
Right, these are one of several fields where dpkg tools do not really
parse or normalize the values, but I've been considering that a
misfeature, as then we get this kind of output, where all consumers
then need to handle such unexpected/wrong values.
After discussion on #debian-qa, we believe that the toolchain should strip the trailing comma to bring the field back into compliance. Much like it
will clean up commas in dependencies.
I think that would be fine, but for Maintainers only during a
transitional period and not as a long-term supported feature, because
this is not like a dependency field where you have multiple values,
and a trailing comma makes sense when placing them each on their own
line, or to handle empty substvars, or similar. For Uploaders it makes
sense to always strip them given that it is a comma-separated list
(and where «wrap-and-sort -sat» seems the best format).
For commas in the middle, I think that will currently need some more consideration (see below).
But when that is not sufficient, it probably makes sense to fail and
report the problem? There's a single case that would be broken right now.
But a dozen of packages with trailing commas.
I was checking the state of the archive, slightly after this got
filed, and the problem seems worse to me:
,---
$ grep-deb-sources -e -sPackage,Maintainer,Extra-Source-Only \
-FMaintainer '.*,' | grep ^Package: | wc -l
49
`---
,---
$ grep-deb-sources -e -sPackage,Maintainer,Uploaders,Extra-Source-Only \
-FUploaders '.*,$' | grep ^Package: | wc -l
6574
`---
In addition to the already reported golang-github-mvo5-goconfigparser,
another one of those has a comma in the middle in the Maintainer field,
but is an Extra-Source-Only:yes source package:
Package: darts
Maintainer: Natural Language Processing, Japanese <
pkg-nlp-ja-devel@lists.alioth.debian.org>
Extra-Source-Only: yes
Where the version in unstable looks fine.
I then started writing a parser for these fields, went checking for
what would be the allowed documented syntax, and ended up in the same
rabbit hole as the following Debian Policy bug reports #401452, #509935
and #962277.
I do think we should clarify the syntax first, and IMO we should go
for the simplest possible syntax probably based on the RFCs avoiding
all obsolete constructs, but allowing for the currently used names
which include a comma, so to me that means supporting quoted names,
which we already have in debian/changelog trailers, and would need
to keep supporting to be able to parse old entries anyway, and
which we need to match against the Maintainer fields.
I'll try to write a parser for the above, and throw it against the
archive Sources indices, and see what can be warned on, and what
errored out directly, then probably update the Debian Policy bug
reports.
Thanks,
Guillem
--- SoupGate-Win32 v1.05
* Origin: you cannot sedate... all the things you hate (1:229/2)