• Bug#1106083: scilab: FTBFS: autobuilder hangs when using the kernel of

    From Santiago Vila@21:1/5 to All on Fri Jul 4 19:50:01 2025
    Option --max-parallel=1 did not help (same failure rate as before).

    I'm open for ideas.

    Thanks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?utf-8?Q?=C3=89tienne?= Mollier@21:1/5 to All on Wed Jul 9 21:20:01 2025
    Hi Santiago,

    Thanks for the details about the question of memory usage, given
    the failure mode and your own diligent implementation details, I
    believe this can reasonably be ruled out.

    Santiago Vila, on 2025-07-09:
    Étienne, on 2025-07-09:
    I also notice that the kernel run is the cloud variant. Maybe
    having a look at differences with the plain kernel could reveal
    some clues? (assuming that the problem has not disappeared from
    6.12.32 to 6.12.33…)

    That would also be a possibility in theory, but I think it's unlikely.

    Right, I tried to isolate the configuration options differences,
    and they seem to boil down to:

    * very many hardware specific options being disabled, because
    we're sure that the kernel will run on top of a hypervisor
    with somewhat homogenous context;
    * a couple of hypervisor and cloud specific options are
    enabled, or inlined instead of modular, to survive early
    boot sequence in cloud context.

    I admit that I don't see any good reason for such changes to
    trigger a chain reaction resulting in a hangup.

    In case it matters, the failure rates that I got recently were:

    10% (5 out of 50) on systems with 1 CPU
    78% (39 out of 50) on systems with 2 CPUs.

    I suspect of a race condition of some kind, so if you are still
    willing to try different things (as opposed to directly trying
    in my VM after I finish my last test build), I would try
    bulding the package on a self-hosted qemu/kvm machine
    with exactly 2 CPUs. You can probably achieve the same
    effect by using GRUB_CMDLINE_LINUX="nr_cpus=2" (i.e.
    modify /etc/default/grub, run update-grub and reboot).

    (btw: Building scilab in unstable 100 times as we speak, I believe
    that by night I will be able to tell the outcome).

    Acknowledged, I run also a couple of builds locally (as long as
    battery capacity allows) and see if I can observe something
    interesting if I manage to reproduce a hangup. If I also have a
    78% failure rate, I may have a reasonable chance to see
    something tonight.

    Have a nice day, :)
    --
    Étienne Mollier <emollier@emlwks999.eu>
    Fingerprint: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da
    Sent from my alarm clock.

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEj5GyJ8fW8rGUjII2eTz2fo8NEdoFAmhutvYACgkQeTz2fo8N Edp8IRAAvBEp5fDHf3HsO1b8eEeRY79qAXLNXFaa4ARyg3XahYCDNW0ttunDBBDt mMLEft7+Vx7GJWhy/PqDugY+yiWhn2bT4ned9R6udQtlraYXpRTf07gEQKrHpmgW BGxXtm6Hjv3V6YkEUPl0Ai5pRnvcoul26basHfs/lvImc0HmLvjBdXiWCMOJzbXP 7hsCVUKl4ZbTe8KDJIlmYJZnsbfsG2KoZLknoKNEfSjJFEszaJcby338VyuTTb9V nw/1fRrjTEc8GAMOdzT4rPBhZct2tzl5TZD2r1NWQgVzHKXCkXixu2ABiKTxK9ew vTBmokZnpA0J9VfWmwKBF//BcLnFdeLNyQ5QMYS9QaRRCD+BpXFAMITXFNxTz8M2 93ZzUFlPwJkyAPApPhEQZsrCcUwkWRB7qjRQ3phxaFTVWGzAD3YWWROob8X4BkhB AVTTI4q2z9uy+spdD3CkZLKCd+VQxA+mJlZN/3lW9+Q/SK5cnctfa8Hb1Imi4YXJ smZTyBx32GVfO9NAdWmCVp+797aykjAyLl2IUxXHuSMFAeIqPJ6+NCPuttC5XYvw h160L2lzHCueQJwcm8O2NWcyaKl/oDZ+d4NyqWgMAyKmHXXguTr9SYhI4hCGD4mT u+4t173kFHIYmhtGs2zMiHW30lfrU41VtSjZ/R9oTe78M/fG8V4=
    =XFcG
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Or
  • From =?utf-8?Q?=C3=89tienne?= Mollier@21:1/5 to All on Fri Jul 11 09:20:01 2025
    Hi Santiago,

    Santiago Vila, on 2025-07-11:
    On Fri, Jul 11, 2025 at 12:09:48AM +0200, Étienne Mollier wrote:
    0. if I trust Helmut Grohne, I cannot empty the entropy pool;
    maybe I can just make it more strained, so the behavior may
    be unreproducible to some extent: for instance I have not
    managed to reproduce the behavior straight on my laptop;

    I would be quite surprised if the entropy pool was the problem.

    Right, I have no third occurrence in a row: my build went
    through this morning.

    I guess that at some point we should probably tell upstream about this.

    We mostly know what the problem is not about, but we still don't
    know what the problem is precisely. I agree informing upstream
    shouldn't hurt, but I don't really expect with the element we
    have that it will be much actionable unfortunately, but one
    never knows.

    Have a nice day, :)
    --
    .''`. Étienne Mollier <emollier@debian.org>
    : :' : pgp: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da
    `. `' sent from my alarm clock
    `-

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEj5GyJ8fW8rGUjII2eTz2fo8NEdoFAmhwulUACgkQeTz2fo8N EdrXMxAAiFOCtNa/yYflDqveGM+0p3hvPZKTbEguZxuX8NXhCyE1HKg0E8KItVwW NtgHirT7tPg8mlF+XEKustS+XiqEAPLQdzX1lZwZbm8cyW6Mxkf++nJXLv68qDiw Xwbau8926fqNBwQRkQ+Crz3gNsZLyPgAaa1q/IhFH1sgI9U7xsY5gvp8wl/2j2CU ll+8pQm9RSolJ9FEDKAn0CsIykQtLJFwnzXd8JoDL9zTEhDhfrKS8M5ZgWFU/ifK 7IoT1I+rEwx/2BVzzS2Bc3Kj76Pt7alxkwgn4dthJxuBqLl/+Lj/GqPzqDI79ucV Kumm4xh7d7BAOKCrCqt7smDeXbwc6fRTvqEYG8ZkzF6eZ7CwQ2hhOcPqHqiCoEvG p8GUy0P7fL106uWwm+3oSDHvGUGtk2lhzk5w9ESKWbbFrvYN4sGdzw7a+mNcb9V1 M0R9xu3rb66hFkx3n7JEMdfrqLI4SQjWguNSZ4owrDLZ9lB8TYdqfsJzN3ZzCPXr Pz+THYq5SvNMiRJpyQTihXhiueJPlN4r30C27dyuhNR1tFb5mROzjkKpScQu9rai /mXS6LOC+9wQUjv2zy6Puz50JjntgS2WzpqqFXROemPoOVApltzmQdJBukqzrvat pmt2UhJ6I/SXj+DPcxcLhUpnhrZPddGmcqaKACEvxKlBuNaC6Ck=
    =sgDp
    -----END PGP SIGNATURE-----

    --- SoupG
  • From =?utf-8?Q?=C3=89tienne?= Mollier@21:1/5 to All on Sat Jul 12 18:10:01 2025
    Control: forwarded -1 https://gitlab.com/scilab/scilab/-/issues/17461
    Control: found -1 2024.1.0+dfsg-6

    Hi Santiago,

    Étienne Mollier, on 2025-07-11:
    Santiago Vila, on 2025-07-11:
    I guess that at some point we should probably tell upstream about this.

    We mostly know what the problem is not about, but we still don't
    know what the problem is precisely. I agree informing upstream
    shouldn't hurt, but I don't really expect with the element we
    have that it will be much actionable unfortunately, but one
    never knows.

    I gathered some information that was missing and could
    reasonably be determined with further testing and opened the
    scilab upstream issue #17461; see the Forwarded field. Feel
    free to complement if you believe some further information might
    be worth forwarding.

    Short summary of new items compared to what has been already
    discussed in the present thread is:

    * I reproduced the issue on a virtual machine running the
    plain Linux kernel 6.12.y+deb13-amd64 instead of the cloud
    variant;
    * I never managed to reproduce the issue on bare metal (not
    news, but it is worth highlighting this only occurred on
    virtual machines so far);
    * I attempted the build with the later openjdk 25 instead of
    the default openjdk 21, per a suggestion from Helmut, but it
    seems the issue was also occurring here.

    Have a nice day, :)
    --
    .''`. Étienne Mollier <emollier@debian.org>
    : :' : pgp: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da
    `. `' sent from my alarm clock
    `-

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEj5GyJ8fW8rGUjII2eTz2fo8NEdoFAmhyhvMACgkQeTz2fo8N Edr5sw//aoNHb47My06lHdNrWioKKJnojhEUApkPe1HKr8V6tkNzgNM1HcuOjfB0 9pVkisDDeCLXNG6ikIJq7GrZq7OarMP0fREQh9+DM/bMIMwR+CkCWcEsZA/vwURQ f0LrwdCgkgcffnNGWMrj8IfGyD0Sk/CpN/6qdAjfzJVPFAoOCcy8EJ5bl7CPc6w7 Vcv4Qt1j6I3stAA9Nv4mfCWXysU0zUY42gKLnMhV/OA1vYY/3u/uFX/8XciU0njf N3vctZSWvkwOOdZ5JQ8DrxsUChB2lF/oRb81lYiJaTJ8eGmT4YD3L76YnSgSslOo hGlqvQJiw2BeGHbmre8/W9wX1Hw5ZArOk8DBLc8MpEQRNPKHdcl92fawC/SFYnp8 Uw7lI9+xAtyHgGnNsqoFbFW7i1qS4ueSwbq9CS2f1o81JyxEPryb/2mGuWro2E59 s7pLzPR2s/J44ljkbgfvqpVl3cERlng+om63XtDOdCNIxDYb/6AfBfMdaNTKWAV7 EeW1NM/vU9LWou/fHDQvgiwlDTe/6iW9CrOoo89RyNjmU6C7u9e1NZI20pvZuuPr mABSZsl6wbGPxfM7xFhOj7BkfL5RURsqNhlg31ffOmharLHJmDknBz6pW4JZchRL 20qwBhvy2W4HzVdoefHQaHcz1wrjIjIRQAhcXB2IUaOUqOcQFV0=
    =TyHZ
    -----END PGP SIGNATURE-----

    --- SoupG
  • From =?utf-8?Q?=C3=89tienne?= Mollier@21:1/5 to All on Sun Jul 13 11:30:01 2025
    Hi Pierre, Hi Santiago,

    Pierre Gruet, on 2025-07-13:
    However, my feeling is that the bug is not RC, since it shows up only in
    VMs: in practice it does not affect the autobuilders nor the developers on their individual machines. What do you think?

    Thank you for your input, I have been trying to gather enough
    elements to either pinpoint the root cause of the issue, or
    assess whether the issue would be critical for the upcoming
    release. In the current circumstances, I am okay with a
    reduction of the severity to "important" for the reasons you
    mention. Santiago, thanks for your invaluable help this far,
    does a severity reduction sound reasonable to you at this stage?

    Have a nice day both, :)
    --
    .''`. Étienne Mollier <emollier@debian.org>
    : :' : pgp: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da
    `. `' sent from my alarm clock
    `-

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEj5GyJ8fW8rGUjII2eTz2fo8NEdoFAmhzedcACgkQeTz2fo8N Edr3tRAAsUfjXH6sU0or0ZqQtrrKeU426IqApRSUu0VIin5Ic+hMBQ5u+pR8CdMx mDMaUVj2eaFUD7f1U9Kaob6TUD8CVzptaOdTuDHJHdAcOGj4l/HMIKnFA7bsE13x UN0AWWDaXGW5fREjEeGbSX7451HKycsdyz5r2NjpzMh9kkhLAM0ZIm06mCy+qWBd 70r7asTJNF4UH3gVicdnNRzt+M/JqVioBpJpkv2l16VcNoc1cSR/2dkKlszycHp9 fuT1hozEJwMydFhl3SL86ScTmolhsYPcqAHFvmzyXKyhxvHzmdMSPNsrp+DyVjD8 mQ6ghgqruFaipSkVDRNBL1bQRFYt2AVS05vaak6nV2yfxHri4tXpMIiU5DBiFB20 PBNO/PuQhbLBickfLhZ8Vdr4gXnSmc9m2Xrj8crx9VVhQ2GSYjGwcKVwCAdC4244 +scP6xyy7/lpLArFnKxLo+ac+EEmMGwFQFhMlu8GstUDUarKLEbS0FTCz8fs7awh yAZC/aBbY8+VaJWOTv2y0TXNxA6XD5v2yJ/tp0SSRWQjk9F9/lwspNpTc45MVZLa vWP0U8wtlob9ylRIUgP+xQzEZrDsOV+XGEmhWLbWimcUcZshod0d47dwOuNADk83 EZXDfSFGA6NCZKAFeJQMzR7WX8TXICQxypnsamEH45NDIGNdK1A=
    =Mht5
    -----END PGP SIGNATURE-----

    --- SoupG