• Re: Building with many cores without OOM (1/2)

    From Guillem Jover@1:229/2 to Helmut Grohne on Mon Dec 9 14:50:01 2024
    XPost: linux.debian.devel
    From: guillem@debian.org

    Hi!

    On Thu, 2024-12-05 at 09:23:24 +0100, Helmut Grohne wrote:
    On Wed, Dec 04, 2024 at 02:03:29PM +0100, Guillem Jover wrote:
    On Thu, 2024-11-28 at 10:54:37 +0100, Helmut Grohne wrote:
    For one thing, I propose extending debhelper to provide --min-ram-per-parallel-core as that seems to be the most common way to
    do it. I've proposed https://salsa.debian.org/debian/debhelper/-/merge_requests/128
    to this end.

    To me this looks too high in the stack (and too Linux-specific :).

    I don't think being Linux-specific is necessarily bad here and note that
    the /proc interface is also supported by Hurd (I actually checked on a
    porter box). The problem we are solving here is a practical one and the solution we pick now probably is no longer relevant in twenty years.
    That's about the time frame I am expect Linux to be the preferred kernel
    used by Debian (could be longer, but unlikely shorter).

    See below for the portability part.

    I think adding this in dpkg-buildpackage itself would make most sense
    to me, where it is already deciding what amount of parallelism to use
    when specifying «auto» for example.

    Given that this would be and outside-in interface, I think this would
    imply declaring these parameters say as debian/control fields for example, or some other file to be parsed from the source tree.

    I find that outside-in vs inside-out distinction quite useful, but I
    actually prefer an inside-out approach. You detail that picking a
    sensible ram-per-core value is environment-specific. Others gave
    examples of how build-systems address this in ways of specifying linker groups with reduced parallelism and you go into detail of how the
    compression parallelism is limited based on system ram already. Given
    all of these, I no longer am convinced that reducing the package-global parallelism is the desired solution. Rather, each individual step may
    benefit from its own limiting and that's what is already happening in
    the archive. It is that inside-out approach that we see in debian/rules
    in some packages. What I now find missing is better tooling to support
    this inside-out approach.

    Not all outside-in interfaces are made equal, as I hinted on that
    other mail, some are (let's call them) permeable, where the build
    driver performs some defaults setup or data gathering that it does
    not necessarily uses itself, and which can be easily overridden by
    the inner packaging files.

    I don't have a strong opinion on this case though, my initial reaction
    was that because dpkg-buildpackage is already trying to provide a good
    default for the number of parallel jobs to use, it seemed like a good
    global place to potentially improve that number to influence all users,
    if say the only thing needed is a declarative hint from the packing
    itself. This being a permeable interface also means the inner processes
    could still ignore or tune that value further or whatever (except for
    the --jobs-force option which is harder to revert from inside).

    But I guess it depends on whether we can have a better general heuristic
    in the outer parallel job number computation. Or whether for the cases
    that we need to tune, any such general heuristic would serve no actual
    purpose and all/most of them would need to be overridden anyway.

    My main concerns would be:

    * Portability.

    I am not concerned. The parallelism limit is a mechanism to increase efficiency of builder deployments and not much more. The portable
    solution is to stuff in more RAM or supply a lower parallel value
    outside-in. A 90% solutions is more than good enough here.

    Right, I agree with the above, because this should be considered an opportunistic quality of life improvement, where the user can always
    manually override it, if the tool does not get it right. My concern
    was about the above debhelper MR failing hard on several conditions
    where it should just simply disable the improved clamping. See for
    example the parallel auto handling in dpkg-buildpackage (after the «run_hook('preinit')»), or lib/dpkg/compress.c:filter_xz_get_memlimit()
    and lib/dpkg/meminfo.c:meminfo_get_available_from_file()) for the
    dpkg-deb one, where these should gracefully fallback to less accurate
    methods if they cannot gather the needed information.

    (Now that you mention it, I should probably enable Hurd for the
    /proc/meminfo codepath. :)

    * Whether this is a local property of the package (so that the
    maintainer has the needed information to decide on a value, or
    whether this depends on the builder's setup, or perhaps both).

    All of what I wrote in this thread thus far assumed that this was a
    local property. That definitely is an oversimplification of the matter
    as an upgraded clang, gcc, ghc or rustc has historically yielded
    increased RAM consumption. The packages affected tend to be sensitive to changes in these packages in other ways, so they generally know quite
    closely what version of dependencies will be in use and can tailor their guesses. So while this is a non-local property in principle, my
    expectation is that treating it as if it was local is good enough for a
    90% solution.

    My thinking here was also about the general case too, say a system
    that has many cores relative to its available memory, where each core
    would get what we'd consider not enough memory per core (assuming for
    example a baseline for what dpkg-deb might require, plus build helpers
    and their interpreters, and what a compiler with say an empty C, C++
    or similar file might need, etc).

    * We might need a way to percolate these parameters to children of
    the build/test system (as Paul has mentioned), where some times
    you cannot specify this directly in the parent. Setting some

    [continued in next message]

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Helmut Grohne@1:229/2 to Helmut Grohne on Wed Dec 25 12:40:01 2024
    XPost: linux.debian.bugs.dist, linux.debian.devel
    From: helmut@subdivi.de

    Package: coreutils
    Version: 9.5-1
    Severity: wishlist
    Tags: patch upstream
    X-Debbugs-Cc: debian-devel@lists.debian.org, debian-dpkg@lists.debian.org

    Hi Michael,

    we recently considered ways of exercising more CPU cores during package
    builds on d-devel@l.d.o. The discussion starts at https://lists.debian.org/debian-devel/2024/11/msg00498.html. There, we considered extending debhelper and dpkg. Neither of those options looked
    really attractive. because they were limiting the parallelity of the
    complete build. However, different phases of a builds tend to require
    different amounts of memory. Typically linking requires more memory than compiling and test suites may have different requirements. The ninja
    build tool partially accommodates this by providing different "pools"
    for different processes limiting linker concurrency. Generally,
    individual packages have the best knowlegde of their individual memory requirements, but turning that into parallelism is presently
    inconvenient. This is where I see nproc helping.

    On Thu, Dec 05, 2024 at 09:23:24AM +0100, Helmut Grohne wrote:
    How about instead we try to extend coreutils' nproc? How about adding
    more options to it?

    I propose adding new options to the nproc utility to support these use
    cases. For one thing, I suggest adding --assume to override initial
    detection. This allows passing the parallel=N value from
    DEB_BUILD_OPTIONS as initial value to nproc. The added value arises from
    a second option --require-mem that reduces the amount of parallelism
    based on available system ram and user-provided requirements.

    Let me sketch some expected uses:

    * Typically build daemons now limit the number of system processors
    by downsizing VMs to avoid builds failing with OOM. Instead, they
    could supply an adjusted DEB_BUILD_OPTIONS=parallel=$(nproc
    --require-mem=2G).

    * Individual packages already reduce parallelism based on available
    memory. A number of them attempt to parse /proc/meminfo. Instead they
    could include /usr/share/dpkg/buildopts.mk and compute parallelism as
    NPROC=$(shell nproc --assume=$(or $(DEB_BUILD_OPTION_PARALLEL),1)
    --require-mem=3G).

    * When using the meson with ninja, the linker parallelism can be
    selected separately using backend_max_links and a different value
    using a different --require-mem argument can be passed.

    I expect these options to reduce complexity in debian/rules files and
    hope that in providing better tooling we can require packages to be
    buildable with higher default parallelism. Finding a good place to share
    this tooling is difficult, but nproc seems like a sensible spot. The
    nproc binary grows by 4kb as a result and this impacts the essential
    base system.

    I'm attaching a patch and note that a significant part of the diff is a
    gnulib update of the physmem module. Option naming improvable.

    What do you think about this approach?

    Helmut

    --- coreutils-9.5.orig/src/nproc.c
    +++ coreutils-9.5/src/nproc.c
    @@ -23,6 +23,7 @@

    #include "system.h"
    #include "nproc.h"
    +#include "physmem.h"
    #include "quote.h"
    #include "xdectoint.h"

    @@ -34,13 +35,17 @@
    enum
    {
    ALL_OPTION = CHAR_MAX + 1,
    - IGNORE_OPTION
    + ASSUME_OPTION,
    + IGNORE_OPTION,
    + REQUIRE_RAM_OPTION
    };

    static struct option const longopts[] =
    {
    {"all", no_argument, nullptr, ALL_OPTION},
    + {"assume", required_argument, nullptr, ASSUME_OPTION},
    {"ignore", required_argument, nullptr, IGNORE_OPTION},
    + {"require-mem", required_argument, nullptr, REQUIRE_RAM_OPTION},
    {GETOPT_HELP_OPTION_DECL},
    {GETOPT_VERSION_OPTION_DECL},
    {nullptr, 0, nullptr, 0}
    @@ -61,7 +66,9 @@
    "), stdout);
    fputs (_("\
    --all print the number of installed processors\n\
    + --assume=N assume the given number of processors before applying limits\n\
    --ignore=N if possible, exclude N processing units\n\
    + --require-mem=M reduce emit