• Re: Help needed for debugging OCaml failure on m68k

    From John Paul Adrian Glaubitz@21:1/5 to All on Wed Jun 19 09:10:01 2024
    Hi Stéphane,

    On Wed, 2024-06-19 at 08:21 +0200, Stéphane Glondu wrote:
    OCaml 5.2.0 FTBFS on m68k:


    https://buildd.debian.org/status/fetch.php?pkg=ocaml&arch=m68k&ver=5.2.0-1%7Eexp1&stamp=1718285451&raw=0

    The failure happens very early, at the very first run of the bytecode interpreter (ocamlrun). It seems to be related to a thread local
    variable that moves unexpectedly. I've posted reports of my
    investigations in an issue on the upstream github:

    https://github.com/ocaml/ocaml/issues/13249

    To reproduce the problem quickly:
    - unpack ocaml 5.2.0 source package
    - ./configure --enable-imprecise-c99-float-ops
    - make coldstart

    Is there some subtlety with thread local variables on m68k?

    Can you please try reproduce the issue on the porterbox mitchy.debian.net first to
    make sure it's not related to the QEMU build environment on the buildds?

    If it turns out to be a QEMU bug, we need to report it there instead.

    Thanks,
    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?St=C3=A9phane_Glondu?=@21:1/5 to All on Wed Jun 19 08:30:01 2024
    Hi all,

    OCaml 5.2.0 FTBFS on m68k:


    https://buildd.debian.org/status/fetch.php?pkg=ocaml&arch=m68k&ver=5.2.0-1%7Eexp1&stamp=1718285451&raw=0

    The failure happens very early, at the very first run of the bytecode interpreter (ocamlrun). It seems to be related to a thread local
    variable that moves unexpectedly. I've posted reports of my
    investigations in an issue on the upstream github:

    https://github.com/ocaml/ocaml/issues/13249

    To reproduce the problem quickly:
    - unpack ocaml 5.2.0 source package
    - ./configure --enable-imprecise-c99-float-ops
    - make coldstart

    Is there some subtlety with thread local variables on m68k?


    Cheers,

    --
    Stéphane

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?St=C3=A9phane_Glondu?=@21:1/5 to All on Wed Jun 19 12:40:01 2024
    Hi,

    Le 19/06/2024 à 09:06, John Paul Adrian Glaubitz a écrit :
    To reproduce the problem quickly:
    - unpack ocaml 5.2.0 source package
    - ./configure --enable-imprecise-c99-float-ops
    - make coldstart

    Is there some subtlety with thread local variables on m68k?

    Can you please try reproduce the issue on the porterbox mitchy.debian.net first to
    make sure it's not related to the QEMU build environment on the buildds?

    I can reproduce the issue on mitchy.debian.net.


    Cheers,

    --
    Stéphane

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to All on Wed Jun 19 12:50:01 2024
    Hi Stéphane,

    On Wed, 2024-06-19 at 12:37 +0200, Stéphane Glondu wrote:
    Le 19/06/2024 à 09:06, John Paul Adrian Glaubitz a écrit :
    To reproduce the problem quickly:
    - unpack ocaml 5.2.0 source package
    - ./configure --enable-imprecise-c99-float-ops
    - make coldstart

    Is there some subtlety with thread local variables on m68k?

    Can you please try reproduce the issue on the porterbox mitchy.debian.net first to
    make sure it's not related to the QEMU build environment on the buildds?

    I can reproduce the issue on mitchy.debian.net.

    OK, then it's actually a bug.

    One important thing to know is that the natural alignment on m68k is actually 16 bits
    and not 32 bits which causes quite some issues with various upstream projects.

    We're currently planning on switching the alignment on m68k to 32 bits and chances are
    that this could this issue as well.

    Can you maybe try passing "-malign-int" to CFLAGS/CXXFLAGS when building OCaml on m68k
    to verify this hypothesis? Please note that this also breaks the SysV ABI, so it's not
    possible to easily do this on a per-package basis.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to All on Wed Jun 19 14:10:02 2024
    Hi Stéphane,

    On Wed, 2024-06-19 at 13:28 +0200, Stéphane Glondu wrote:
    Can you maybe try passing "-malign-int" to CFLAGS/CXXFLAGS when building OCaml on m68k
    to verify this hypothesis? Please note that this also breaks the SysV ABI, so it's not
    possible to easily do this on a per-package basis.

    I observe the same behaviour with "-malign-int": the address of
    caml_state (a thread local variable) changes unexpectedly (goes from 0x402e5fac to 0x402e7454) after the following goto:


    https://salsa.debian.org/ocaml-team/ocaml/-/blob/debian/experimental/runtime/interp.c?ref_type=heads#L295

    which leads to:


    https://salsa.debian.org/ocaml-team/ocaml/-/blob/debian/experimental/runtime/interp.c?ref_type=heads#L819

    ...confirmed by adding:

    fprintf(stderr, "&caml_state = %p\n", &caml_state);

    before the goto and after the "Instruct(BRANCH):".

    Hmm, I guess then maybe Andreas Schwab or Geert Uytterhoeven might have an idea what
    the problem with the TLS variable is. I'll CC both.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?St=C3=A9phane_Glondu?=@21:1/5 to All on Wed Jun 19 13:30:01 2024
    Le 19/06/2024 à 12:45, John Paul Adrian Glaubitz a écrit :
    To reproduce the problem quickly:
    - unpack ocaml 5.2.0 source package
    - ./configure --enable-imprecise-c99-float-ops
    - make coldstart

    Is there some subtlety with thread local variables on m68k?

    Can you please try reproduce the issue on the porterbox mitchy.debian.net first to
    make sure it's not related to the QEMU build environment on the buildds?

    I can reproduce the issue on mitchy.debian.net.

    OK, then it's actually a bug.

    One important thing to know is that the natural alignment on m68k is actually 16 bits
    and not 32 bits which causes quite some issues with various upstream projects.

    We're currently planning on switching the alignment on m68k to 32 bits and chances are
    that this could this issue as well.

    Can you maybe try passing "-malign-int" to CFLAGS/CXXFLAGS when building OCaml on m68k
    to verify this hypothesis? Please note that this also breaks the SysV ABI, so it's not
    possible to easily do this on a per-package basis.

    I observe the same behaviour with "-malign-int": the address of
    caml_state (a thread local variable) changes unexpectedly (goes from
    0x402e5fac to 0x402e7454) after the following goto:


    https://salsa.debian.org/ocaml-team/ocaml/-/blob/debian/experimental/runtime/interp.c?ref_type=heads#L295

    which leads to:


    https://salsa.debian.org/ocaml-team/ocaml/-/blob/debian/experimental/runtime/interp.c?ref_type=heads#L819

    ...confirmed by adding:

    fprintf(stderr, "&caml_state = %p\n", &caml_state);

    before the goto and after the "Instruct(BRANCH):".


    Cheers,

    --
    Stéphane

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to All on Wed Jun 19 15:00:01 2024
    On Wed, 2024-06-19 at 14:45 +0200, Stéphane Glondu wrote:
    Hmm, I guess then maybe Andreas Schwab or Geert Uytterhoeven might have an idea what
    the problem with the TLS variable is. I'll CC both.

    I noticed that &caml_state changes when pc changes. Looking further, pc
    is a register variable pinned to a5. I guess this conflicts with the implementation of TLS...?

    I've removed the register pin and launched a build, and it goes past the problematic point. We'll see how it goes...

    Oh, nice catch. Yeah, I think A5 is the register that is used for TLS but Andreas or Geert need to correct me.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to All on Sat Jun 22 10:20:01 2024
    Hi Stéphane,

    On Wed, 2024-06-19 at 14:45 +0200, Stéphane Glondu wrote:
    I noticed that &caml_state changes when pc changes. Looking further, pc
    is a register variable pinned to a5. I guess this conflicts with the implementation of TLS...?

    I've removed the register pin and launched a build, and it goes past the problematic point. We'll see how it goes...

    Were you able to complete the build successfully on m68k?

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)