• [gentoo-dev] Systemtap with dist-kernel

    From Martin Kletzander@21:1/5 to All on Fri Jun 10 14:30:01 2022
    Hello,

    I am trying to make systemtap work with gentoo-kernel (or ideally all
    dist kernels) and I got a few steps closer with kernel-build.eclass modification I sent this week [0]. However there is still one issue and
    that is the fact that build-id of the kernel does not match the
    installed vmlinux file:

    # stap mba_sc.stp
    WARNING: Build-id mismatch [man warning::buildid]: "/usr/src/linux-5.17.13-gentoo-dist/vmlinux" pid 0 address
    0xffffffff8a7b572c, expected c43e775aad5e11755bf5cf1329d2240b519e7518
    actual 3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c
    WARNING: /usr/bin/staprun exited with status: 1
    Pass 5: run failed. [man error::pass5]

    I also noticed that when kernel-build.eclass installs the vmlinux file
    it also (I presume portage) creates vmlinux.debug using objcopy --only-keep-debug --compress-debug-sections.

    So now I am in a situation where I have these relevant files on the
    system:

    - /usr/src/linux-5.17.13-gentoo-dist/vmlinux
    - /usr/lib/debug/.build-id/c4/3e775aad5e11755bf5cf1329d2240b519e7518.debug
    (symlink to the first file)
    - /usr/lib/debug/usr/src/linux-5.17.13-gentoo-dist/vmlinux.debug and
    - /boot/vmlinuz-5.17.13-gentoo-dist


    When I check the build ids (using readelf -n or just "file") of the
    first three files I get:

    /usr/src/linux-5.17.13-gentoo-dist/vmlinux:
    ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, not stripped

    /usr/lib/debug/.build-id/c4/3e775aad5e11755bf5cf1329d2240b519e7518.debug:
    ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, with debug_info,
    not stripped

    /usr/lib/debug/usr/src/linux-5.17.13-gentoo-dist/vmlinux.debug:
    ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, with debug_info,
    not stripped

    which looks great except:

    1) the first file does not say it is "with debug_info",

    2) there is no reason to keep the original vmlinux in place since there
    is a smaller file that works as a substitute, but I'm not sure what's
    a clean way to not install it, and most importantly

    3) the fact that the running kernel has a different build id.

    The last point is the main issue here. I was trying to find how to
    check for the build id of the running kernel, but haven't found any way
    on how to do it with a kernel API, so instead I checked the /boot/vmlinuz-5.17.13-gentoo-dist like this:

    ~/dev/linux/scripts/extract-vmlinux /boot/vmlinuz-5.17.13-gentoo-dist >vmlinux.extracted

    and for good measure also tried what objcopy does to it:

    objcopy --only-keep-debug vmlinux.extracted vmlinux.extracted.debug
    objcopy --only-keep-debug --compress-debug-sections vmlinux.extracted vmlinux.extracted.compressed

    Now when I check the build id is different from the first files, but
    unchanged with objcopy and same as systemtap reports for the running
    kernel:

    vmlinux.extracted:
    ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped

    vmlinux.extracted.compressed:
    ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped

    vmlinux.extracted.debug:
    ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped


    At this point I got stuck, not knowing when and how does the build-id
    changes and where to extract the debug symbols from. I would also like
    to clean up the change I did. So I came here with my question(s) and
    rather lengthy explanations. Does anyone know what would be the best
    way to deal with this? Or even where to continue looking? I would
    really like to make systemtap "just work" on Gentoo with the
    distribution kernels, but I already spent a lot of time on it, so I
    figured I'll rather ask here since I'm not that proficient with the
    intricacies of the build system parts.

    Thanks a lot for any pointers and have a great day,
    Martin

    [0] https://github.com/gentoo/gentoo/pull/25789

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Kletzander@21:1/5 to Martin Kletzander on Thu Jun 16 17:30:01 2022
    I finally figured out what is happening, but I am not sure what would be
    the best way to work around it.

    The problem is that with FEATURES=splitdebug the vmlinux binary is being processed by estrip, which uses debugedit and specifically asks it to
    recompute the build id. However, the bzImage is created from the
    vmlinux *before* that, and thus preserves the old build-id.

    One option would be to create the vmlinux.debug file manually, but I am
    afraid it would duplicate lot of the code from estrip, unless it can
    somehow be uses cleanly by the ebuild. The advantage of this would be
    that there is no need for the huge vmlinux file after that and we can
    just keep the vmlinux.debug around instead.

    I'll end with a couple of closing questions if I may:

    - Does anyone have an idea for some a clean way to do this?

    - Is it preferable to use GitHub PRs or this ML for such eclass changes?

    - What is exactly the reason for portage using the `-i`/`--build-id`
    option of debugedit?

    Thanks and have a nice day,
    Martin

    On Fri, Jun 10, 2022 at 02:22:00PM +0200, Martin Kletzander wrote:
    Hello,

    I am trying to make systemtap work with gentoo-kernel (or ideally all
    dist kernels) and I got a few steps closer with kernel-build.eclass >modification I sent this week [0]. However there is still one issue and
    that is the fact that build-id of the kernel does not match the
    installed vmlinux file:

    # stap mba_sc.stp
    WARNING: Build-id mismatch [man warning::buildid]: >"/usr/src/linux-5.17.13-gentoo-dist/vmlinux" pid 0 address >0xffffffff8a7b572c, expected c43e775aad5e11755bf5cf1329d2240b519e7518
    actual 3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c
    WARNING: /usr/bin/staprun exited with status: 1
    Pass 5: run failed. [man error::pass5]

    I also noticed that when kernel-build.eclass installs the vmlinux file
    it also (I presume portage) creates vmlinux.debug using objcopy >--only-keep-debug --compress-debug-sections.

    So now I am in a situation where I have these relevant files on the
    system:

    - /usr/src/linux-5.17.13-gentoo-dist/vmlinux
    - /usr/lib/debug/.build-id/c4/3e775aad5e11755bf5cf1329d2240b519e7518.debug
    (symlink to the first file)
    - /usr/lib/debug/usr/src/linux-5.17.13-gentoo-dist/vmlinux.debug and
    - /boot/vmlinuz-5.17.13-gentoo-dist


    When I check the build ids (using readelf -n or just "file") of the
    first three files I get:

    /usr/src/linux-5.17.13-gentoo-dist/vmlinux:
    ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, >BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, not stripped

    /usr/lib/debug/.build-id/c4/3e775aad5e11755bf5cf1329d2240b519e7518.debug:
    ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, >BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, with debug_info,
    not stripped

    /usr/lib/debug/usr/src/linux-5.17.13-gentoo-dist/vmlinux.debug:
    ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, >BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, with debug_info,
    not stripped

    which looks great except:

    1) the first file does not say it is "with debug_info",

    2) there is no reason to keep the original vmlinux in place since there
    is a smaller file that works as a substitute, but I'm not sure what's
    a clean way to not install it, and most importantly

    3) the fact that the running kernel has a different build id.

    The last point is the main issue here. I was trying to find how to
    check for the build id of the running kernel, but haven't found any way
    on how to do it with a kernel API, so instead I checked the >/boot/vmlinuz-5.17.13-gentoo-dist like this:

    ~/dev/linux/scripts/extract-vmlinux /boot/vmlinuz-5.17.13-gentoo-dist >vmlinux.extracted

    and for good measure also tried what objcopy does to it:

    objcopy --only-keep-debug vmlinux.extracted vmlinux.extracted.debug
    objcopy --only-keep-debug --compress-debug-sections vmlinux.extracted vmlinux.extracted.compressed

    Now when I check the build id is different from the first files, but >unchanged with objcopy and same as systemtap reports for the running
    kernel:

    vmlinux.extracted:
    ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, >BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped

    vmlinux.extracted.compressed:
    ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, >BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped

    vmlinux.extracted.debug:
    ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, >BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped


    At this point I got stuck, not knowing when and how does the build-id
    changes and where to extract the debug symbols from. I would also like
    to clean up the change I did. So I came here with my question(s) and
    rather lengthy explanations. Does anyone know what would be the best
    way to deal with this? Or even where to continue looking? I would
    really like to make systemtap "just work" on Gentoo with the
    distribution kernels, but I already spent a lot of time on it, so I
    figured I'll rather ask here since I'm not that proficient with the >intricacies of the build system parts.

    Thanks a lot for any pointers and have a great day,
    Martin

    [0] https://github.com/gentoo/gentoo/pull/25789

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Kletzander@21:1/5 to Martin Kletzander on Fri Jun 24 12:50:01 2022
    OK, I finally managed to work around it, and even though it is not as
    nice as I would've hoped for it works. So I sent it as a follow-up PR:

    https://github.com/gentoo/gentoo/pull/26065

    Martin

    On Thu, Jun 16, 2022 at 05:24:26PM +0200, Martin Kletzander wrote:
    I finally figured out what is happening, but I am not sure what would be
    the best way to work around it.

    The problem is that with FEATURES=splitdebug the vmlinux binary is being >processed by estrip, which uses debugedit and specifically asks it to >recompute the build id. However, the bzImage is created from the
    vmlinux *before* that, and thus preserves the old build-id.

    One option would be to create the vmlinux.debug file manually, but I am >afraid it would duplicate lot of the code from estrip, unless it can
    somehow be uses cleanly by the ebuild. The advantage of this would be
    that there is no need for the huge vmlinux file after that and we can
    just keep the vmlinux.debug around instead.

    I'll end with a couple of closing questions if I may:

    - Does anyone have an idea for some a clean way to do this?

    - Is it preferable to use GitHub PRs or this ML for such eclass changes?

    - What is exactly the reason for portage using the `-i`/`--build-id`
    option of debugedit?

    Thanks and have a nice day,
    Martin

    On Fri, Jun 10, 2022 at 02:22:00PM +0200, Martin Kletzander wrote:
    Hello,

    I am trying to make systemtap work with gentoo-kernel (or ideally all
    dist kernels) and I got a few steps closer with kernel-build.eclass >>modification I sent this week [0]. However there is still one issue and >>that is the fact that build-id of the kernel does not match the
    installed vmlinux file:

    # stap mba_sc.stp
    WARNING: Build-id mismatch [man warning::buildid]: >>"/usr/src/linux-5.17.13-gentoo-dist/vmlinux" pid 0 address >>0xffffffff8a7b572c, expected c43e775aad5e11755bf5cf1329d2240b519e7518 >>actual 3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c
    WARNING: /usr/bin/staprun exited with status: 1
    Pass 5: run failed. [man error::pass5]

    I also noticed that when kernel-build.eclass installs the vmlinux file
    it also (I presume portage) creates vmlinux.debug using objcopy >>--only-keep-debug --compress-debug-sections.

    So now I am in a situation where I have these relevant files on the
    system:

    - /usr/src/linux-5.17.13-gentoo-dist/vmlinux
    - /usr/lib/debug/.build-id/c4/3e775aad5e11755bf5cf1329d2240b519e7518.debug
    (symlink to the first file)
    - /usr/lib/debug/usr/src/linux-5.17.13-gentoo-dist/vmlinux.debug and
    - /boot/vmlinuz-5.17.13-gentoo-dist


    When I check the build ids (using readelf -n or just "file") of the
    first three files I get:

    /usr/src/linux-5.17.13-gentoo-dist/vmlinux:
    ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, >>BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, not stripped

    /usr/lib/debug/.build-id/c4/3e775aad5e11755bf5cf1329d2240b519e7518.debug: >>ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, >>BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, with debug_info, >>not stripped

    /usr/lib/debug/usr/src/linux-5.17.13-gentoo-dist/vmlinux.debug:
    ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, >>BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, with debug_info, >>not stripped

    which looks great except:

    1) the first file does not say it is "with debug_info",

    2) there is no reason to keep the original vmlinux in place since there
    is a smaller file that works as a substitute, but I'm not sure what's
    a clean way to not install it, and most importantly

    3) the fact that the running kernel has a different build id.

    The last point is the main issue here. I was trying to find how to
    check for the build id of the running kernel, but haven't found any way
    on how to do it with a kernel API, so instead I checked the >>/boot/vmlinuz-5.17.13-gentoo-dist like this:

    ~/dev/linux/scripts/extract-vmlinux /boot/vmlinuz-5.17.13-gentoo-dist >vmlinux.extracted

    and for good measure also tried what objcopy does to it:

    objcopy --only-keep-debug vmlinux.extracted vmlinux.extracted.debug
    objcopy --only-keep-debug --compress-debug-sections vmlinux.extracted vmlinux.extracted.compressed

    Now when I check the build id is different from the first files, but >>unchanged with objcopy and same as systemtap reports for the running >>kernel:

    vmlinux.extracted:
    ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, >>BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped

    vmlinux.extracted.compressed:
    ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, >>BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped

    vmlinux.extracted.debug:
    ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, >>BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped


    At this point I got stuck, not knowing when and how does the build-id >>changes and where to extract the debug symbols from. I would also like
    to clean up the change I did. So I came here with my question(s) and >>rather lengthy explanations. Does anyone know what would be the best
    way to deal with this? Or even where to continue looking? I would
    really like to make systemtap "just work" on Gentoo with the
    distribution kernels, but I already spent a lot of time on it, so I
    figured I'll rather ask here since I'm not that proficient with the >>intricacies of the build system parts.

    Thanks a lot for any pointers and have a great day,
    Martin

    [0] https://github.com/gentoo/gentoo/pull/25789

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)