• Bug#1104789: libhtml-gumbo-perl: erratic behavior on the unsupported te

    From Vincent Lefevre@21:1/5 to All on Tue May 6 16:00:01 2025
    Package: libhtml-gumbo-perl
    Version: 0.18-4+b1
    Severity: serious
    Tags: security upstream
    Justification: security
    Forwarded: https://github.com/ruz/HTML-Gumbo/issues/6
    X-Debbugs-Cc: Debian Security Team <team@security.debian.org>

    I get erratic behavior on the template HTML element, e.g. on
    the HTML file "<template>". For instance:

    $ perl -C -MHTML::Gumbo -e "print HTML::Gumbo->new->parse('<template>', format => 'string');"
    <html><head>\217¥�¾U</head><body></body></html>
    $ perl -C -MHTML::Gumbo -e "print HTML::Gumbo->new->parse('<template>', format => 'string');"
    <html><head>)�>\220U</head><body></body></html>
    $ perl -C -MHTML::Gumbo -e "print HTML::Gumbo->new->parse('<template>', format => 'string');"
    <html><head>q'N$uU</head><body></body></html>

    One can see random output, which may include control characters
    (above, I have changed them to \217 and \220 as Emacs shows them,
    to avoid such control characters in the mail message).

    With valgrind:

    $ valgrind perl -C -MHTML::Gumbo -e "print HTML::Gumbo->new->parse('<template>', format => 'string');"
    ==64955== Memcheck, a memory error detector
    ==64955== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al. ==64955== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info ==64955== Command: perl -C -MHTML::Gumbo -e print\ HTML::Gumbo-\>new-\>parse('\<template\>',\ format\ =\>\ 'string');
    ==64955==
    ==64955== Conditional jump or move depends on uninitialised value(s)
    ==64955== at 0x484DC89: strlen (vg_replace_strmem.c:505)
    ==64955== by 0x2AD7DF: ??? (in /usr/bin/perl)
    ==64955== by 0x486D6CE: tree_to_string (Gumbo.xs:189)
    ==64955== by 0x486E2C4: walk_tree.isra.0 (Gumbo.xs:55)
    ==64955== by 0x486E2C4: walk_tree.isra.0 (Gumbo.xs:55)
    ==64955== by 0x486E2C4: walk_tree.isra.0 (Gumbo.xs:55)
    ==64955== by 0x486E41B: parse_to_string_cb (Gumbo.xs:505)
    ==64955== by 0x486ED4B: common_parse.isra.0 (Gumbo.xs:545)
    ==64955== by 0x486F09C: XS_HTML__Gumbo_parse_to_string (Gumbo.xs:559) ==64955== by 0x20B3E7: ??? (in /usr/bin/perl)
    ==64955== by 0x290C95: Perl_runops_standard (in /usr/bin/perl)
    ==64955== by 0x179E51: perl_run (in /usr/bin/perl)
    ==64955==
    <html><head></head><body></body></html>
    ==64955==
    ==64955== HEAP SUMMARY:
    ==64955== in use at exit: 592,160 bytes in 2,369 blocks
    ==64955== total heap usage: 7,166 allocs, 4,797 frees, 1,159,576 bytes allocated
    ==64955==
    ==64955== LEAK SUMMARY:
    ==64955== definitely lost: 18,102 bytes in 19 blocks
    ==64955== indirectly lost: 50,698 bytes in 23 blocks
    ==64955== possibly lost: 514,100 bytes in 2,318 blocks
    ==64955== still reachable: 9,260 bytes in 9 blocks
    ==64955== of which reachable via heuristic:
    ==64955== newarray : 1,056 bytes in 33 blocks ==64955== suppressed: 0 bytes in 0 blocks
    ==64955== Rerun with --leak-check=full to see details of leaked memory ==64955==
    ==64955== Use --track-origins=yes to see where uninitialised values come from ==64955== For lists of detected and suppressed errors, rerun with: -s
    ==64955== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

    So, uninitialized data are used for the output.

    If I use "format => 'callback'" (will a callback) instead of
    "format => 'string'", then I get the following error:

    Unknown node type at /usr/lib/x86_64-linux-gnu/perl5/5.40/HTML/Gumbo.pm line 298, <> line 1.

    (which is better from the security point of view, but prevents one
    from parsing some modern HTML documents).

    It apparently comes from Gumbo.xs, where there are two occurrences of

    croak("Unknown node type");

    I suspect that this is the first one as the second one corresponds to
    text node types.

    The cause is probably the most recent node type GUMBO_NODE_TEMPLATE
    from the Gumbo library (libgumbo):

    typedef enum {
    /** Document node. v will be a GumboDocument. */
    GUMBO_NODE_DOCUMENT,
    /** Element node. v will be a GumboElement. */
    GUMBO_NODE_ELEMENT,
    /** Text node. v will be a GumboText. */
    GUMBO_NODE_TEXT,
    /** CDATA node. v will be a GumboText. */
    GUMBO_NODE_CDATA,
    /** Comment node. v will be a GumboText, excluding comment delimiters. */
    GUMBO_NODE_COMMENT,
    /** Text node, where all contents is whitespace. v will be a GumboText. */
    GUMBO_NODE_WHITESPACE,
    /** Template node. This is separate from GUMBO_NODE_ELEMENT because many
    * client libraries will want to ignore the contents of template nodes, as
    * the spec suggests. Recursing on GUMBO_NODE_ELEMENT will do the right thing
    * here, while clients that want to include template contents should also
    * check for GUMBO_NODE_TEMPLATE. v will be a GumboElement. */
    GUMBO_NODE_TEMPLATE
    } GumboNodeType;

    This node type was added in 2015:

    https://github.com/google/gumbo-parser/commit/4383a40605ee7872a8e2de58553383a13d919153

    but most of the HTML::Gumbo code predates this change.

    -- System Information:
    Debian Release: trixie/sid
    APT prefers unstable-debug
    APT policy: (500, 'unstable-debug'), (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable-debug'), (500, 'proposed-updates-debug'), (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental')
    Architecture: amd64 (x86_64)
    Foreign Architectures: i386

    Kernel: Linux 6.11.10-amd64 (SMP w/12 CPU threads; PREEMPT)
    Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
    Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
    Shell: /bin/sh linked to /usr/bin/dash
    Init: systemd (via /run/systemd/system)
    LSM: AppArmor: enabled

    Versions of packages libhtml-gumbo-perl depends on:
    ii libc6 2.41-7
    ii libgumbo3 0.13.0+dfsg-2
    ii libhtml-tree-perl 5.07-3
    ii perl 5.40.1-3
    ii perl-base [perlapi-5.40.0] 5.40.1-3

    libhtml-gumbo-perl recommends no packages.

    libhtml-gumbo-perl suggests no packages.

    -- no debconf information

    --
    Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
    100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
    Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Debian Bug Tracking System@21:1/5 to All on Sat May 17 12:10:01 2025
    Processing control commands:

    tag -1 patch
    Bug #1104789 [libhtml-gumbo-perl] libhtml-gumbo-perl: erratic behavior on the unsupported template HTML element - GUMBO_NODE_TEMPLATE node type
    Added tag(s) patch.

    --
    1104789: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104789
    Debian Bug Tracking System
    Contact owner@bugs.debian.org with problems

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Niko Tyni@21:1/5 to Vincent Lefevre on Sat May 17 12:10:01 2025
    Control: tag -1 patch

    On Tue, May 06, 2025 at 03:48:52PM +0200, Vincent Lefevre wrote:
    Package: libhtml-gumbo-perl
    Version: 0.18-4+b1
    Severity: serious
    Tags: security upstream
    Justification: security
    Forwarded: https://github.com/ruz/HTML-Gumbo/issues/6
    X-Debbugs-Cc: Debian Security Team <team@security.debian.org>

    I get erratic behavior on the template HTML element, e.g. on
    the HTML file "<template>". For instance:

    ==64955== Command: perl -C -MHTML::Gumbo -e print\ HTML::Gumbo-\>new-\>parse('\<template\>',\ format\ =\>\ 'string');
    ==64955==
    ==64955== Conditional jump or move depends on uninitialised value(s) ==64955== at 0x484DC89: strlen (vg_replace_strmem.c:505)
    ==64955== by 0x2AD7DF: ??? (in /usr/bin/perl)
    ==64955== by 0x486D6CE: tree_to_string (Gumbo.xs:189)
    ==64955== by 0x486E2C4: walk_tree.isra.0 (Gumbo.xs:55)
    ==64955== by 0x486E2C4: walk_tree.isra.0 (Gumbo.xs:55)
    ==64955== by 0x486E2C4: walk_tree.isra.0 (Gumbo.xs:55)
    ==64955== by 0x486E41B: parse_to_string_cb (Gumbo.xs:505)

    The attached change does not make HTML::Gumbo support <template>
    properly but seems to plug this specific hole, and hence the
    known security aspects.

    I've checked that this doesn't break the (not very extensive) test
    suite, and that the only reverse dependency in trixie, request-tracker5,
    still builds with this.

    Tentatively tagging 'patch', but eyeballs would be good.

    I think full support for <template> should be a separate wishlist bug.
    --
    Niko Tyni ntyni@debian.org

    From 549609cd80784012c274c11731e6a31787d3555e Mon Sep 17 00:00:00 2001
    From: Niko Tyni <ntyni@debian.org>
    Date: Sat, 17 May 2025 09:32:06 +0100
    Subject: [PATCH] Fix wrong code path with GUMBO_NODE_TEMPLATE

    GUMBO_NODE_TEMPLATE was introduced in Gumbo 0.10.0 but HTML-Gumbo has
    not been updated to support that.

    This makes walk_tree() take the text node branch for templates
    and access uninitialized memory.

    The gumbo C library seems to treat GUMBO_NODE_TEMPLATE very
    similarly to GUMBO_NODE_ELEMENT. From

    https://sources.debian.org/src/gumbo-parser/0.13.0%2Bdfsg-2/src/gumbo.h/#L304

    /** Template node. This is separate from GUMBO_NODE_ELEMENT because many
    * client libraries will want to ignore the contents of template nodes, as
    * the spec suggests. Recursing on GUMBO_NODE_ELEMENT will do the right thing
    * here, while clients that want to include template contents should also
    * check for GUMBO_NODE_TEMPLATE. v will be a GumboElement. */

    So we add it to the list "special" container types in walk_tree()
    that attach a GumboElement value rather than a GumboText.

    Bug-Debian: https://bugs.debian.org/1104789
    Bug: https://github.com/ruz/HTML-Gumbo/issues/6
    ---
    lib/HTML/Gumbo.xs | 2 +-
    1 file changed, 1 insertion(+), 1 deletion(-)

    diff --git a/lib/HTML/Gumbo.xs b/lib/HTML/Gumbo.xs
    index 97dfc43..32427d7 100644
    --- a/lib/HTML/Gumbo.xs
    +++ b/lib/HTML/Gumbo.xs
    @@ -38,7 +38,7 @@ typedef enum {
    STATIC
    void
    walk_tree(pTHX_ GumboNode* node, int flags, void (*cb)(pTHX_ PerlHtmlGumboType, GumboNode*, void*), void* ctx ) {
    - if ( node->type == GUMBO_NODE_DOCUMENT || node->type == GUMBO_NODE_ELEMENT ) {
    + if ( node->type == GUMBO_NODE_DOCUMENT || node->type == GUMBO_NODE_ELEMENT || node->type == GUMBO_NODE_TEMPLATE) {
    GumboVector* children;
    int skip = flags&PHG_FLAG_SKIP_ROOT_ELEMENT && node->type == GUMBO_NODE_ELEMENT && node->parent && node->parent->type == GUMBO_NODE_DOCUMENT;
    if ( !skip ) {
    --
    2.49.0

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Debian Bug Tracking System@21:1/5 to All on Sat May 17 15:00:01 2025
    Processing control commands:

    tag -1 pending
    Bug #1104789 [libhtml-gumbo-perl] libhtml-gumbo-perl: erratic behavior on the unsupported template HTML element - GUMBO_NODE_TEMPLATE node type
    Added tag(s) pending.

    --
    1104789: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104789
    Debian Bug Tracking System
    Contact owner@bugs.debian.org with problems

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gregor herrmann@21:1/5 to All on Sat May 17 15:00:01 2025
    Control: tag -1 pending

    Hello,

    Bug #1104789 in libhtml-gumbo-perl reported by you has been fixed in the
    Git repository and is awaiting an upload. You can see the commit
    message below and you can check the diff of the fix at:

    https://salsa.debian.org/perl-team/modules/packages/libhtml-gumbo-perl/-/commit/f9de66e265bce8c607d7f4a80819725b0b44d661

    ------------------------------------------------------------------------
    Add patch to fix wrong code path with GUMBO_NODE_TEMPLATE.

    Thanks: Vincent Lefevre for the bug report and Niko Tyni for the patch.
    Closes: #1104789 ------------------------------------------------------------------------

    (this message was generated automatically)
    --
    Greetings

    https://bugs.debian.org/1104789

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Debian Bug Tracking System@21:1/5 to All on Sat May 17 15:10:01 2025
    This is a multi-part message in MIME format...

    Your message dated Sat, 17 May 2025 13:04:19 +0000
    with message-id <E1uGHD9-00BZ3K-Al@fasolo.debian.org>
    and subject line Bug#1104789: fixed in libhtml-gumbo-perl 0.18-5
    has caused the Debian Bug report #1104789,
    regarding libhtml-gumbo-perl: erratic behavior on the unsupported template HTML element - GUMBO_NODE_TEMPLATE node type
    to be marked as done.

    This means that you claim that the problem has been dealt with.
    If this is not the case it is now your responsibility to reopen the
    Bug report if necessary, and/or fix the problem forthwith.

    (NB: If you are a system administrator and have no idea what this
    message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact owner@bugs.debian.org
    immediately.)


    --
    1104789: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104789
    Debian Bug Tracking System
    Contact owner@bugs.debian.org with problems

    Received: (at submit) by bugs.debian.org; 6 May 2025 13:48:56 +0000 X-Spam-Checker-Version: SpamAssassin 3.4.6-bugs.debian.org_2005_01_02
    (2021-04-09) on buxtehude.debian.org
    X-Spam-Level:
    X-Spam-Status: No, score=-14.9 required=4.0 tests=BAYES_00,
    BODY_INCLUDES_PACKAGE,CONTENT_AFTER_HTML,FOURLA,HAS_PACKAGE,
    MD5_SHA1_SUM,RCVD_IN_VALIDITY_CERTIFIED_BLOCKED,
    RCVD_IN_VALIDITY_RPBL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham
    autolearn_force=no version=3.4.6-bugs.debian.org_2005_01_02 X-Spam-Bayes: score:0.0000 Tokens: new, 92; hammy, 150; neutral, 344; spammy,
    0. spammytokens: hammytokens:0.000-+--XDebbugsCc,
    0.000-+--X-Debbugs-Cc, 0.000-+--trixie, 0.000-+--sk:taint_o,
    0.000-+--sk:TAINT_O
    Return-path: <vincent@vinc17.net>
    Received: from cventin.lip.ens-lyon.fr ([140.77.13.17]:55348)
    by buxtehude.debian.org with esmtps (TLS1.3:ECDHE_SECP256R1__