• Bug#1060320: dpkg-buildpackage: PERL_UNICODE variable causes bad encodi

    From Peter Krefting@1:229/2 to All on Tue Jan 9 15:50:02 2024
    XPost: linux.debian.bugs.dist
    From: peter@softwolves.pp.se

    Package: dpkg-dev
    Version: 1.21.22
    Severity: wishlist
    Tags: l10n

    With PERL5OPTS=-Mutf8 and PERL_UNICODE=SDL set in environment [1], output from dpkg-buildpackage (and others) is garbled ("double" UTF-8 encoding):

    $ dpkg-buildpackage --version
    Debian dpkg-buildpackage version 1.21.22.

    Detta program är fri programvara. Se GNU General Public License version 2
    eller senare för kopieringsvillkor. Det finns INGEN garanti.

    Unsetting PERL5OPTS fixes it:

    $ bash -c "unset PERL_UNICODE; dpkg-buildpackage --version"
    Debian dpkg-buildpackage version 1.21.22.

    Detta program är fri programvara. Se GNU General Public License version 2
    eller senare för kopieringsvillkor. Det finns INGEN garanti.


    [1] As per https://stackoverflow.com/a/6163129

    -- Package-specific info:
    This system uses merged-usr-via-aliased-dirs, going behind dpkg's
    back, breaking its core assumptions. This can cause silent file
    overwrites and disappearances, and its general tools misbehavior.
    See <https://wiki.debian.org/Teams/Dpkg/FAQ#broken-usrmerge>.

    -- System Information:
    Debian Release: 12.4
    APT prefers stable-updates
    APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'oldoldstable'), (500, 'stable')
    Architecture: amd64 (x86_64)
    Foreign Architectures: i386

    Kernel: Linux 6.1.0-17-amd64 (SMP w/20 CPU threads; PREEMPT)
    Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
    Locale: LANG=sv, LC_CTYPE=sv (charmap=UTF-8) (ignored: LC_ALL set to sv_SE.utf8), LANGUAGE=sv_SE:sv:nb_NO:nb:da_DK:da:nn_NO:nn:en_GB:en_US:en
    Shell: /bin/sh linked to /usr/bin/dash
    Init: systemd (via /run/systemd/system)
    LSM: AppArmor: enabled

    Versions of packages dpkg-dev depends on:
    ii binutils 2.40-2
    ii bzip2 1.0.8-5+b1
    ii libdpkg-perl 1.21.22
    ii make 4.3-4.1
    ii patch 2.7.6-7
    ii perl 5.36.0-7+deb12u1
    ii tar 1.34+dfsg-1.2
    ii xz-utils 5.4.1-0.2

    Versions of packages dpkg-dev recommends:
    ii build-essential 12.9
    ii fakeroot 1.31-1.2
    ii gcc [c-compiler] 4:12.2.0-3
    ii gcc-10 [c-compiler] 10.2.1-6
    ii gcc-12 [c-compiler] 12.2.0-14
    ii gnupg 2.2.40-1.1
    ii gpgv 2.2.40-1.1
    ii libalgorithm-merge-perl 0.08-5

    Versions of packages dpkg-dev suggests:
    ii debian-keyring 2022.12.24

    -- no debconf information

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Guillem Jover@1:229/2 to Peter Krefting on Fri Jan 19 01:30:01 2024
    XPost: linux.debian.bugs.dist
    From: guillem@debian.org

    Hi!

    On Tue, 2024-01-09 at 15:39:33 +0100, Peter Krefting wrote:
    Package: dpkg-dev
    Version: 1.21.22
    Severity: wishlist
    Tags: l10n

    With PERL5OPTS=-Mutf8 and PERL_UNICODE=SDL set in environment [1], output from
    dpkg-buildpackage (and others) is garbled ("double" UTF-8 encoding):

    $ dpkg-buildpackage --version
    Debian dpkg-buildpackage version 1.21.22.

    Detta program är fri programvara. Se GNU General Public License version 2
    eller senare för kopieringsvillkor. Det finns INGEN garanti.

    Unsetting PERL5OPTS fixes it:

    $ bash -c "unset PERL_UNICODE; dpkg-buildpackage --version"
    Debian dpkg-buildpackage version 1.21.22.

    Detta program är fri programvara. Se GNU General Public License version 2
    eller senare för kopieringsvillkor. Det finns INGEN garanti.

    Right, dpkg does not currently set its streams to be UTF-8. But I
    agree it probably should.

    [1] As per https://stackoverflow.com/a/6163129

    Ah, yes, that page is great, I've had it in my bookmarks for a long
    time. :D

    In any case I started looking into this the other day, and the first
    blocker is that adding «use open qw(:encoding(UTF-8) :std);» is not
    enough as the gettext code needs to be switched to its Object Oriented
    methods which handle encoding according to the locale automatically,
    otherwise we also get doubly encoded output. I've got some of this in
    a branch but…

    …my concern is whether just with those two things will be enough, or
    if we'll get botched input/output, like we did in around 2008, when a
    similar change in spirit was done for dpkg-genchanges, dpkg-gencontrol
    and dpkg-source. So I'll need to check all this thoroughly, and add
    new test cases, etc.

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)