• Bug#1086028: I've reproduced the bug in QEMU

    From Sergei Golovan@21:1/5 to All on Thu Feb 13 11:40:01 2025
    tag 1086028 + patch
    tag 1087809 + patch
    tag 1093200 + patch
    thanks

    Hi!

    I've finally managed to reproduce this EFAULT in QEMU (using an
    Erlang-based script which is shipped in the wings3d source package):

    1) I've installed Debian bookworm for mips64el in qemu-system-mips64el
    virtual machine (version from unstable), and upgraded it to the
    current unstable (machine is loongson3-virt, cpu is Loongson-3A4000).
    2) I have to enable SMP in qemu and use -rtc clock=rt (otherwise the
    virtual machine won't boot, with clock=rt sometimes it boots,
    sometimes it hangs). The full QEMU command line is:

    qemu-system-mips64el -machine loongson3-virt -m 4g -cpu Loongson-3A4000 \
    -smp 2,sockets=2,cores=1,threads=1,maxcpus=2 \
    -kernel vmlinuz-loongson-3 \
    -rtc clock=rt \
    -initrd initrd.img-loongson-3 -drive if=none,file=hda1.bin,id=hd,format=raw \
    -net nic -net tap,ifname=tap0,script=/bin/true \
    -device virtio-blk-pci,drive=hd -append "root=/dev/vda1 console=ttyS0" \
    -nographic

    Here kernel and initrd can be either stock 6.1.123-1 version or
    6.1.123-1 with the attached patch. Unfortunately, QEMU can't boot for
    me using the newest 6.12.12-1 kernel (it complains that it can't
    uncompress initrd, I don't know why).

    4) I've install the build dependencies of wings3d (basically, only
    erlang-base is necessary)
    5) I've extracted the wings3d source package (from stable: https://packages.debian.org/source/stable/wings3d)
    6) I've added the following line as the second line to wings3d-2.2.9/intl_tools/gen_char_hrl

    %%! +S 4:4 +SDcpu 4:4 +c false

    (The first two options enable multiple threads, the last one allows
    some workaround for the case when monotonic clock jumps backwards,
    which appears to be the case for QEMU with SMP enabled).
    7) I've run this gen_char_hrl in a loop until it fails.

    The result is that with the stock 6.1.123-1 kernel approximately in 1%
    cases the script aborts with message:

    signal-dispatcher thread got unexpected error: efault (14)

    which is exactly the error that prevents Erlang (and many Erlang-based packages) from building on mips64el.

    On the other hand, with the patched kernel the script loop is still
    running for more than 24 hours (a few thousands runs) without
    aborting. So I'm now fairly confident that the patch fixes the bug.

    I'm not sure if there's no adverse effects caused by the patch, so
    it'd be better to try it on real hardware as well.

    The patch is derived from the thread [1]. It reverses commit [2] with
    an additional change, which is necessary because of changes in
    expand_stack() introduced in commit [3].

    [1] https://lore.kernel.org/all/mvmplxraqmd.fsf@suse.de/T/
    [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4bce37a68ff884e821a02a731897a8119e0c37b7
    [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d7071af890768438c14db6172cc8f9f4d04e184

    Cheers!
    --
    Sergei Golovan

    RnJvbTogU2VyZ2VpIEdvbG92YW4gPHNnb2xvdmFuQGRlYmlhbi5vcmc+CkRhdGU6IFdlZCwgMDUg RmViIDIwMjUgMTU6NDc6MDYgKzAzMDAKU3ViamVjdDogW1BBVENIXSBtaXBzL21tOiBSZXZlcnQg Y29udmVydGluZyB0byB1c2luZyBsb2NrX21tX2FuZF9maW5kX3ZtYSgpCiBUaGUgcGF0Y2ggcmV2 ZXJ0cyA0YmNlMzdhNjhmZjg4NGU4MjFhMDJhNzMxODk3YTgxMTllMGMzN2I3IGFuZAogYWRhcHRz IHRoZSBjb2RlIHRvIHRoZSBjaGFuZ2VzIGluIHRoZSBleHBhbmRfc3RhY2soKSBwcm90b3R5cGUK IHVzaW5nIGV4YW1wbGVzIGZyb20gOGQ3MDcxYWY4OTA3Njg0MzhjMTRkYjYxNzJjYzhmOWY0ZDA0 ZTE4NAogLgogSG9wZWZ1bGx5LCB0aGlzIHNob3VsZCBmaXggIzEwOTMyMDAsICMxMDkzODU5LCAj MTA4NzgwOSwgIzEwODYwMjgKQnVnOiBodHRwczovL2xvcmUua2VybmVsLm9yZy9hbGwvbXZtcGx4 cmFxbWQuZnNmQHN1c2UuZGUvVC8KCi0tLSBhL2FyY2gvbWlwcy9LY29uZmlnCisrKyBiL2FyY2gv bWlwcy9LY29uZmlnCkBAIC05Nyw3ICs5Nyw2IEBACiAJc2VsZWN0IEhBVkVfVklSVF9DUFVfQUND T1VOVElOR19HRU4gaWYgNjRCSVQgfHwgIVNNUAogCXNlbGVjdCBJUlFfRk9SQ0VEX1RIUkVBRElO RwogCXNlbGVjdCBJU0EgaWYgRUlTQQotCXNlbGVjdCBMT0NLX01NX0FORF9GSU5EX1ZNQQogCXNl bGVjdCBNT0RVTEVTX1VTRV9FTEZfUkVMIGlmIE1PRFVMRVMKIAlzZWxlY3QgTU9EVUxFU19VU0Vf RUxGX1JFTEEgaWYgTU9EVUxFUyAmJiA2NEJJVAogCXNlbGVjdCBQRVJGX1VTRV9WTUFMTE9DCi0t LSBhL2FyY2gvbWlwcy9tbS9mYXVsdC5jCisrKyBiL2FyY2gvbWlwcy9tbS9mYXVsdC5jCkBAIC0x MDAsMTMgKzEwMCwyMiBAQAogCiAJcGVyZl9zd19ldmVudChQRVJGX0NPVU5UX1NXX1BBR0VfRkFV TFRTLCAxLCByZWdzLCBhZGRyZXNzKTsKIHJldHJ5OgotCXZtYSA9IGxvY2tfbW1fYW5kX2ZpbmRf dm1hKG1tLCBhZGRyZXNzLCByZWdzKTsKKwltbWFwX3JlYWRfbG9jayhtbSk7CisJdm1hID0gZmlu ZF92bWEobW0sIGFkZHJlc3MpOworCWlmICghdm1hKQorCQlnb3RvIGJhZF9hcmVhOworCWlmICh2 bWEtPnZtX3N0YXJ0IDw9IGFkZHJlc3MpCisJCWdvdG8gZ29vZF9hcmVhOworCWlmICghKHZtYS0+ dm1fZmxhZ3MgJiBWTV9HUk9XU0RPV04pKQorCQlnb3RvIGJhZF9hcmVhOworCXZtYSA9IGV4cGFu ZF9zdGFjayhtbSwgYWRkcmVzcyk7CiAJaWYgKCF2bWEpCiAJCWdvdG8gYmFkX2FyZWFfbm9zZW1h cGhvcmU7CiAvKgogICogT2ssIHdlIGhhdmUgYSBnb29kIHZtX2FyZWEgZm9yIHRoaXMgbWVtb3J5 IGFjY2Vzcywgc28KICAqIHdlIGNhbiBoYW5kbGUgaXQuLgogICovCitnb29kX2FyZWE6CiAJc2lf Y29kZSA9IFNFR1ZfQUNDRVJSOwogCiAJaWYgKHdyaXRlKSB7Cg==

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Salvatore Bonaccorso@21:1/5 to Sergei Golovan on Thu Feb 20 00:00:01 2025
    Hi,

    On Thu, Feb 13, 2025 at 01:35:13PM +0300, Sergei Golovan wrote:
    tag 1086028 + patch
    tag 1087809 + patch
    tag 1093200 + patch
    thanks

    Hi!

    I've finally managed to reproduce this EFAULT in QEMU (using an
    Erlang-based script which is shipped in the wings3d source package):

    1) I've installed Debian bookworm for mips64el in qemu-system-mips64el virtual machine (version from unstable), and upgraded it to the
    current unstable (machine is loongson3-virt, cpu is Loongson-3A4000).
    2) I have to enable SMP in qemu and use -rtc clock=rt (otherwise the
    virtual machine won't boot, with clock=rt sometimes it boots,
    sometimes it hangs). The full QEMU command line is:

    qemu-system-mips64el -machine loongson3-virt -m 4g -cpu Loongson-3A4000 \
    -smp 2,sockets=2,cores=1,threads=1,maxcpus=2 \
    -kernel vmlinuz-loongson-3 \
    -rtc clock=rt \
    -initrd initrd.img-loongson-3 -drive if=none,file=hda1.bin,id=hd,format=raw \
    -net nic -net tap,ifname=tap0,script=/bin/true \
    -device virtio-blk-pci,drive=hd -append "root=/dev/vda1 console=ttyS0" \
    -nographic

    Here kernel and initrd can be either stock 6.1.123-1 version or
    6.1.123-1 with the attached patch. Unfortunately, QEMU can't boot for
    me using the newest 6.12.12-1 kernel (it complains that it can't
    uncompress initrd, I don't know why).

    4) I've install the build dependencies of wings3d (basically, only erlang-base is necessary)
    5) I've extracted the wings3d source package (from stable: https://packages.debian.org/source/stable/wings3d)
    6) I've added the following line as the second line to wings3d-2.2.9/intl_tools/gen_char_hrl

    %%! +S 4:4 +SDcpu 4:4 +c false

    (The first two options enable multiple threads, the last one allows
    some workaround for the case when monotonic clock jumps backwards,
    which appears to be the case for QEMU with SMP enabled).
    7) I've run this gen_char_hrl in a loop until it fails.

    The result is that with the stock 6.1.123-1 kernel approximately in 1%
    cases the script aborts with message:

    signal-dispatcher thread got unexpected error: efault (14)

    which is exactly the error that prevents Erlang (and many Erlang-based packages) from building on mips64el.

    On the other hand, with the patched kernel the script loop is still
    running for more than 24 hours (a few thousands runs) without
    aborting. So I'm now fairly confident that the patch fixes the bug.

    I'm not sure if there's no adverse effects caused by the patch, so
    it'd be better to try it on real hardware as well.

    The patch is derived from the thread [1]. It reverses commit [2] with
    an additional change, which is necessary because of changes in
    expand_stack() introduced in commit [3].

    [1] https://lore.kernel.org/all/mvmplxraqmd.fsf@suse.de/T/
    [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4bce37a68ff884e821a02a731897a8119e0c37b7
    [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d7071af890768438c14db6172cc8f9f4d04e184

    Just one observation, as we talked about this issue in our weekly
    Kernel team meeting: Reverting
    4bce37a68ff884e821a02a731897a8119e0c37b7 might not be an option as
    this is part of the upstream fixes for adressing CVE-2023-3269.

    Some information about the CVE: https://www.openwall.com/lists/oss-security/2023/07/05/1 https://github.com/lrh2000/StackRot https://www.openwall.com/lists/oss-security/2023/07/28/1

    This means that this needs to be adressed (upstream, for 6.1.y) in a
    way that it does not break the CVE fix but unbreaks the mips64el
    situation.

    Ben aims to look into it.

    Regards,
    Salvatore

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Salvatore Bonaccorso@21:1/5 to Salvatore Bonaccorso on Thu Feb 20 08:10:01 2025
    Hi

    [Adding to recipients Thomas Bogendoerfer, Jiaxun Yang, this is about
    Debian issue with MIPS, https://bugs.debian.org/1086028, https://bugs.debian.org/1087809, https://bugs.debian.org/1093200]

    On Wed, Feb 19, 2025 at 11:52:03PM +0100, Salvatore Bonaccorso wrote:
    Hi,

    On Thu, Feb 13, 2025 at 01:35:13PM +0300, Sergei Golovan wrote:
    tag 1086028 + patch
    tag 1087809 + patch
    tag 1093200 + patch
    thanks

    Hi!

    I've finally managed to reproduce this EFAULT in QEMU (using an Erlang-based script which is shipped in the wings3d source package):

    1) I've installed Debian bookworm for mips64el in qemu-system-mips64el virtual machine (version from unstable), and upgraded it to the
    current unstable (machine is loongson3-virt, cpu is Loongson-3A4000).
    2) I have to enable SMP in qemu and use -rtc clock=rt (otherwise the virtual machine won't boot, with clock=rt sometimes it boots,
    sometimes it hangs). The full QEMU command line is:

    qemu-system-mips64el -machine loongson3-virt -m 4g -cpu Loongson-3A4000 \
    -smp 2,sockets=2,cores=1,threads=1,maxcpus=2 \
    -kernel vmlinuz-loongson-3 \
    -rtc clock=rt \
    -initrd initrd.img-loongson-3 -drive if=none,file=hda1.bin,id=hd,format=raw \
    -net nic -net tap,ifname=tap0,script=/bin/true \
    -device virtio-blk-pci,drive=hd -append "root=/dev/vda1 console=ttyS0" \
    -nographic

    Here kernel and initrd can be either stock 6.1.123-1 version or
    6.1.123-1 with the attached patch. Unfortunately, QEMU can't boot for
    me using the newest 6.12.12-1 kernel (it complains that it can't
    uncompress initrd, I don't know why).

    4) I've install the build dependencies of wings3d (basically, only erlang-base is necessary)
    5) I've extracted the wings3d source package (from stable: https://packages.debian.org/source/stable/wings3d)
    6) I've added the following line as the second line to wings3d-2.2.9/intl_tools/gen_char_hrl

    %%! +S 4:4 +SDcpu 4:4 +c false

    (The first two options enable multiple threads, the last one allows
    some workaround for the case when monotonic clock jumps backwards,
    which appears to be the case for QEMU with SMP enabled).
    7) I've run this gen_char_hrl in a loop until it fails.

    The result is that with the stock 6.1.123-1 kernel approximately in 1% cases the script aborts with message:

    signal-dispatcher thread got unexpected error: efault (14)

    which is exactly the error that prevents Erlang (and many Erlang-based packages) from building on mips64el.

    On the other hand, with the patched kernel the script loop is still
    running for more than 24 hours (a few thousands runs) without
    aborting. So I'm now fairly confident that the patch fixes the bug.

    I'm not sure if there's no adverse effects caused by the patch, so
    it'd be better to try it on real hardware as well.

    The patch is derived from the thread [1]. It reverses commit [2] with
    an additional change, which is necessary because of changes in expand_stack() introduced in commit [3].

    [1] https://lore.kernel.org/all/mvmplxraqmd.fsf@suse.de/T/
    [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4bce37a68ff884e821a02a731897a8119e0c37b7
    [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d7071af890768438c14db6172cc8f9f4d04e184

    Just one observation, as we talked about this issue in our weekly
    Kernel team meeting: Reverting
    4bce37a68ff884e821a02a731897a8119e0c37b7 might not be an option as
    this is part of the upstream fixes for adressing CVE-2023-3269.

    Some information about the CVE: https://www.openwall.com/lists/oss-security/2023/07/05/1 https://github.com/lrh2000/StackRot https://www.openwall.com/lists/oss-security/2023/07/28/1

    This means that this needs to be adressed (upstream, for 6.1.y) in a
    way that it does not break the CVE fix but unbreaks the mips64el
    situation.

    Ben aims to look into it.

    There is 8fa507083388 ("mm/memory: Use exception ip to search
    exception tables") upstream which fixes 4bce37a68ff8 ("mips/mm:
    Convert to using lock_mm_and_find_vma()") and relates to the thread https://lore.kernel.org/r/75e9fd7b08562ad9b456a5bdaacb7cc220311cc9.camel@xry111.site/
    .

    In fact the commit was backported to:
    v6.6.18: 94d34a6861a2807356b653fc12f958196ebbc043 mm/memory: Use exception ip to search exception tables
    v6.7.6: c3a7dbff8d0d4d7174d2162e4db7bdcfd3cb8886 mm/memory: Use exception ip to search exception tables
    v6.8-rc5: 8fa5070833886268e4fb646daaca99f725b378e9 mm/memory: Use exception ip to search exception tables

    but cannot as it is for 6.1.y. It at least depends on 11ba1728be3e
    ("ptrace: Introduce exception_ip arch hook").

    As Ben Hutching said he will take action to look at this issue I will
    not further "hijack" the thread, but I though it was worth mentioning
    the relation to CVE-2023-3269/StackRot and the potential missing bits
    from upper stable series.

    Regards,
    Salvatore

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)