• Re: Building with many cores without OOM

    From Niels Thykier@1:229/2 to All on Fri Nov 29 15:10:02 2024
    XPost: linux.debian.devel
    From: niels@thykier.net
    Copy: debian-dpkg@lists.debian.org (Dpkg-Maintainers)

    This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --------------jft3PAXpt7n87w1ZSdyjFhX5
    Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: base64
    [SoupGate killed MIME-encoded file 00000000.ATT (2916 bytes)]



    --------------jft3PAXpt7n87w1ZSdyjFhX5--

    -----BEGIN PGP SIGNATURE-----

    iQEzBAEBCgAdFiEE9ecZmu9eXGflVYc/dA1oiINl0okFAmdJyeIACgkQdA1oiINl 0olVrggArjhwjfYELAes00+fi/QZY0w5esCv2GY3Y70bD63iR/w99UQRtw2iYs9R lkkLDWDzL4RN75AM1Ip+Q9wdAoFLWp14szr1xj86C1QFE1NUgCAz08Pkn/5lIu7G yuqRyv3nBE0tKLOGjwSdNwMhRCDKDJ8F9+TzB9NqDG/pbdtspQwleJCnQvQnPVL4 Ajmg1vQwYajAPwbs3m4bHM3tzwMUF3OskEMSl0jMrF+GyEbB3PG0D4aEekRtBSjG BggekmFiadonJzsnmPfkq3hwVOSJH1cPeUd/UZrscexA9junVkRn/opxFpllrwQW L1CZnbtf+Nt2Bve7PsQ8A8PujrAFzg==
    =7rkq
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Guillem Jover@1:229/2 to Helmut Grohne on Wed Dec 4 14:10:01 2024
    XPost: linux.debian.devel
    From: guillem@debian.org

    Hi!

    On Thu, 2024-11-28 at 10:54:37 +0100, Helmut Grohne wrote:
    I am one of those who builds a lot of different packages with different requirements and found that picking a good parallel=... value in DEB_BUILD_OPTIONS is hard. Go too low and your build takes very long. Go
    too high and you swap until the OOM killer terminates your build. (Usage
    of choom recommended in any case.)

    I think this demonstrates that we probably have something between 10 and
    50 packages in unstable that would benefit from a generic parallelism
    limit based on available RAM. Do others agree that this is a problem
    worth solving in a more general way?

    I think the general idea make sense, yes.

    For one thing, I propose extending debhelper to provide --min-ram-per-parallel-core as that seems to be the most common way to
    do it. I've proposed https://salsa.debian.org/debian/debhelper/-/merge_requests/128
    to this end.

    To me this looks too high in the stack (and too Linux-specific :).

    Unfortunately, a the affeted packages tend to not just be big, but also
    so special that they cannot use dh_auto_*. As a result, I also looked at another layer to support this and found /usr/share/dpkg/buildopts.mk,
    which sets DEB_BUILD_OPTION_PARALLEL by parsing DEB_BUILD_OPTIONS. How
    about extending this file with a mechanism to reduce parallelity? I am attaching a possible extension to it to this mail to see what you think. Guillem, is that something you consider including in dpkg?

    I'm not a huge fan of the make fragment files, as make programming is
    rather brittle, and it easily causes lots of processes to spawn if you
    look at it the wrong way (ideally I'd really like to be able to get
    rid of them once we can rely on something else!). I think we could
    consider adding it there, but as a last resort option, if there's no
    other better place.

    Are there other layers that could reasonably be used to implement a more general form of parallelism limiting based on system RAM? Ideally, we'd consolidate these implementations into fewer places.

    I think adding this in dpkg-buildpackage itself would make most sense
    to me, where it is already deciding what amount of parallelism to use
    when specifying «auto» for example.

    Given that this would be and outside-in interface, I think this would
    imply declaring these parameters say as debian/control fields for example,
    or some other file to be parsed from the source tree.

    My main concerns would be:

    * Portability.
    * Whether this is a local property of the package (so that the
    maintainer has the needed information to decide on a value, or
    whether this depends on the builder's setup, or perhaps both).
    * We might need a way to percolate these parameters to children of
    the build/test system (as Paul has mentioned), where some times
    you cannot specify this directly in the parent. Setting some
    standardize environment variables would seem sufficient I think,
    but while all this seems kind of optional, this goes a bit into
    reliance on dpkg-buildpackage being the only supported build
    entry point. :)

    As I am operating build daemons (outside Debian), I note that I have to
    limit their cores below what is actually is available to avoid OOM
    kills and even that is insufficient in some cases. In adopting such a mechanism, we could generally raise the core count per buildd and
    consider OOM a problem of the package to be fixed by applying a sensible parallelism limit.

    See above, on whether this is really package or setup dependent.

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Guillem Jover@1:229/2 to Guillem Jover on Wed Dec 4 14:30:01 2024
    XPost: linux.debian.devel
    From: guillem@debian.org

    Hi!

    On Wed, 2024-12-04 at 14:03:30 +0100, Guillem Jover wrote:
    On Thu, 2024-11-28 at 10:54:37 +0100, Helmut Grohne wrote:
    Are there other layers that could reasonably be used to implement a more general form of parallelism limiting based on system RAM? Ideally, we'd consolidate these implementations into fewer places.

    I think adding this in dpkg-buildpackage itself would make most sense
    to me, where it is already deciding what amount of parallelism to use
    when specifying «auto» for example.

    Given that this would be and outside-in interface, I think this would
    imply declaring these parameters say as debian/control fields for example,
    or some other file to be parsed from the source tree.

    My main concerns would be:

    * Portability.
    * Whether this is a local property of the package (so that the
    maintainer has the needed information to decide on a value, or
    whether this depends on the builder's setup, or perhaps both).
    * We might need a way to percolate these parameters to children of
    the build/test system (as Paul has mentioned), where some times
    you cannot specify this directly in the parent. Setting some
    standardize environment variables would seem sufficient I think,
    but while all this seems kind of optional, this goes a bit into
    reliance on dpkg-buildpackage being the only supported build
    entry point. :)

    Ah, and forgot to mention, that for example dpkg-deb (via libdpkg)
    already implements this kind of parallelism limiter based on system
    memory when compressing to xz. But in that case we are assisted by
    liblzma telling us the amount of memory expected to be used, so it
    makes it easier to clamp the parallelism based on that. Unfortunately
    I'm not sure, in general, we have this kind of information available,
    and my assumption is that in many cases we might end up deciding on
    clamping factors out of current observations based on current
    implementation details, that might need manual tracking and adjustment
    going on.

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Stefano Rivera@1:229/2 to All on Wed Dec 4 15:40:01 2024
    XPost: linux.debian.devel
    From: stefanor@debian.org

    [SoupGate killed MIME-encoded file 00000000.ATT (978 bytes)]

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Guillem Jover@1:229/2 to Stefano Rivera on Thu Dec 5 03:50:01 2024
    XPost: linux.debian.devel
    From: guillem@debian.org

    Hi!

    On Wed, 2024-12-04 at 14:37:45 +0000, Stefano Rivera wrote:
    Hi Guillem (2024.12.04_13:03:29_+0000)
    Are there other layers that could reasonably be used to implement a more general form of parallelism limiting based on system RAM? Ideally, we'd consolidate these implementations into fewer places.

    I think adding this in dpkg-buildpackage itself would make most sense
    to me, where it is already deciding what amount of parallelism to use
    when specifying «auto» for example.

    Given that this would be and outside-in interface, I think this would
    imply declaring these parameters say as debian/control fields for example, or some other file to be parsed from the source tree.

    I don't think this can be entirely outside-in, the package needs to say
    how much ram it needs per-core, to be able to calculate the appropriate degree of parallelism. So, we have to declare a value that then gets calculated against the proposed parallelism.

    I _think_ we are saying the same, and there might just be a mismatch
    in nomenclature (most probably stemming from me being non-native and using/reusing terms incorrectly)? So let me clarify what I meant,
    otherwise I might be misunderstanding your comment, and I'd appreciate
    a clarification. :)

    When dealing with dpkg packaging build interfaces, in my mind there are
    two main models:

    * outside-in: where the build driver (dpkg-buildpackage in this case)
    can reach for all needed information and then do stuff based on that,
    or pass that information down into debian/rules process hierarchy,
    or to tools it invokes itself (say dpkg-genchanges); another such
    interface could be R³ where trying to change the default from
    debian/rules is already too late, as that's managed by the
    build driver.

    * inside-out: where debian/rules, files sourced from it, or tools
    invoked from it, fully control the outcome of the operation, and
    then dpkg-buildpackage might not be able to tell beforehand
    exactly what will happen and will need to pick up the results after
    the fact, for example that would include dpkg-deb or dpkg-distaddfile
    being currently fully delegated to debian/rules, and then
    dpkg-buildpackage, et al. picking that up through debian/files;
    debhelper would be a similar paradigm.

    (With some exceptions, I consider that the bulk of our build interfaces
    are unfortunately mostly inside-out.)

    For this particular case, I'd envision the options could look something
    like:

    * outside-in:

    - We add a new field, say (with this not very good name that would
    need more thought) Build-Parallel-Mem-Limit-Per-Core for the
    debian/control source stanza, then dpkg-buildpackage would be able
    to check the current system memory, and clamp the number of
    computed parallel jobs based on the number of system cores, the
    number of specified parallel jobs and the limit from the above
    field. This then would be passed down as the usual parallel=
    DEB_BUILD_OPTIONS.

    - If we needed the package to provide a dynamic value depending on
    other external factors outside its control, although there's no
    current precedent that I'm aware, and it seems a bit ugly, I guess
    we could envision some kind of new entry point and a way to
    let the build drivers know it needs to call it, for example a
    debian/rules target that gets called and generates some file or
    a program to call under debian/ that prints some value, which
    dpkg-buildpackage could use in a similar way as the above point.

    * inside-out:

    For this, there could be multiple variants, where a build driver
    like dpkg-buildpackage is completely out of the picture, and were
    we might end up with parallel settings that are out-of-sync
    between DEB_BUILD_OPTIONS parallel= and the inner one, for example:

    - One could be the initially proposed buildopts.mk extension,

    - Add a new dpkg-something helper or a new command to an existing
    tool, that could compute the value that debian/rules would use
    or pass further down,

    - debhelper/debputy/etc does it, but that leaves out non-helper
    using packages, which was one of the initial concerns from
    Helmut.

    Hope this clarifies.

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Helmut Grohne@1:229/2 to Guillem Jover on Thu Dec 5 13:10:01 2024
    XPost: linux.debian.devel
    From: helmut@subdivi.de

    Hi Guillem and others,

    Thanks for your extensive reply and the followup clarifying the
    inside-out and outside-in distinction.

    On Wed, Dec 04, 2024 at 02:03:29PM +0100, Guillem Jover wrote:
    On Thu, 2024-11-28 at 10:54:37 +0100, Helmut Grohne wrote:
    I think this demonstrates that we probably have something between 10 and
    50 packages in unstable that would benefit from a generic parallelism
    limit based on available RAM. Do others agree that this is a problem
    worth solving in a more general way?

    I think the general idea make sense, yes.

    Given the other replies on this thread, I conclude that we have rough
    consensus on this being a problem worth solving (expending effort and
    code and later maintenance cost on).

    For one thing, I propose extending debhelper to provide --min-ram-per-parallel-core as that seems to be the most common way to
    do it. I've proposed https://salsa.debian.org/debian/debhelper/-/merge_requests/128
    to this end.

    To me this looks too high in the stack (and too Linux-specific :).

    Let me take the opportunity to characterize this proposal inside-out
    given your distinction.

    I don't think being Linux-specific is necessarily bad here and note that
    the /proc interface is also supported by Hurd (I actually checked on a
    porter box). The problem we are solving here is a practical one and the solution we pick now probably is no longer relevant in twenty years.
    That's about the time frame I am expect Linux to be the preferred kernel
    used by Debian (could be longer, but unlikely shorter).

    I think adding this in dpkg-buildpackage itself would make most sense
    to me, where it is already deciding what amount of parallelism to use
    when specifying «auto» for example.

    Given that this would be and outside-in interface, I think this would
    imply declaring these parameters say as debian/control fields for example,
    or some other file to be parsed from the source tree.

    I find that outside-in vs inside-out distinction quite useful, but I
    actually prefer an inside-out approach. You detail that picking a
    sensible ram-per-core value is environment-specific. Others gave
    examples of how build-systems address this in ways of specifying linker
    groups with reduced parallelism and you go into detail of how the
    compression parallelism is limited based on system ram already. Given
    all of these, I no longer am convinced that reducing the package-global parallelism is the desired solution. Rather, each individual step may
    benefit from its own limiting and that's what is already happening in
    the archive. It is that inside-out approach that we see in debian/rules
    in some packages. What I now find missing is better tooling to support
    this inside-out approach.

    My main concerns would be:

    * Portability.

    I am not concerned. The parallelism limit is a mechanism to increase
    efficiency of builder deployments and not much more. The portable
    solution is to stuff in more RAM or supply a lower parallel value
    outside-in. A 90% solutions is more than good enough here.

    * Whether this is a local property of the package (so that the
    maintainer has the needed information to decide on a value, or
    whether this depends on the builder's setup, or perhaps both).

    All of what I wrote in this thread thus far assumed that this was a
    local property. That definitely is an oversimplification of the matter
    as an upgraded clang, gcc, ghc or rustc has historically yielded
    increased RAM consumption. The packages affected tend to be sensitive to changes in these packages in other ways, so they generally know quite
    closely what version of dependencies will be in use and can tailor their guesses. So while this is a non-local property in principle, my
    expectation is that treating it as if it was local is good enough for a
    90% solution.

    * We might need a way to percolate these parameters to children of
    the build/test system (as Paul has mentioned), where some times
    you cannot specify this directly in the parent. Setting some
    standardize environment variables would seem sufficient I think,
    but while all this seems kind of optional, this goes a bit into
    reliance on dpkg-buildpackage being the only supported build
    entry point. :)

    To me, this reads as an argument for using an inside-out approach.

    Given all of the other replies (on-list and off-list), my vision of how
    I'd like to see this approached has changed. I see more and more value
    in leaving this in close control of the package maintainer (i.e.
    inside-out) to the point where different parts of the build may use
    different limits.

    How about instead we try to extend coreutils' nproc? How about adding
    more options to it?

    --assume-units=N
    --max-units=N
    --min-ram-per-unit=Z

    Then, we could continue to use buildopts.mk and other mechanism to
    extract the passed parallel value from DEB_BUILD_OPTIONS as before and
    run it through an nproc invocation for passing it down to a build system
    in the specific ways that the build system requires. More options could
    be added to nproc as-needed.

    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From sre4ever@free.fr@1:229/2 to All on Mon Dec 9 21:50:01 2024
    XPost: linux.debian.devel

    Hi,

    Let me plop in right into this discussion with no general solution and
    more things to think about. For the context I'm packaging Java things,
    and Java has historically been notoriously bad at guessing how much
    memory it could actually use on a given system. I'm not sure things are
    much better these days. This is just to remind that the issue is nowhere
    near as easy as it looks and that many attempts to generalize an
    approach that works well in some cases have failed.

    Le 2024-12-09 14:42, Guillem Jover a écrit :

    My thinking here was also about the general case too, say a system
    that has many cores relative to its available memory, where each core
    would get what we'd consider not enough memory per core

    This is actually a common situation on most systems but a few privileged developers configurations. This is especially true in cloud-like, VM/containerized environments where it is much easier (i.e. with less
    immediate consequences) to overcommit CPU cores than RAM. Just look at
    the price list of any cloud computing provider to get an idea of the
    ratios you could start with. And then the provider may well lie about
    the actual availability of the cores they will readily bill you for, and
    you will only notice that when your application will grind to a halt at
    the worst possible time (e.g. on a Black Friday if your business is to
    sell stuff), but at least it won't get OOM-killed.

    There are a few packages that are worrying me about how I'm going to
    make them build and run their test suites on Salsa without either timing
    out on one side, and getting immediately OOM killed at the other end of
    the slider. One of them wants to allocate 17GiB of RAM per test worker,
    and wants at least 3 of them. Another (Gradle) needs approximately 4 GiB
    of RAM (JVM processes alone, adding OS cache + overhead to that probably
    makes the total around 6-7 GiB) per additional worker for its build, and
    I don't know yet how much is needed for its tests suites as my current
    setup lacks the storage space necessary to run them. On my current
    low-end laptop (4 threads, 16 GiB RAM) dpkg guesses [1] are wrong , I
    can only run a single worker if I want to keep an IDE and a web browser
    running on the side. Two if I close the IDE and kill all browser tabs
    and other memory hogs. I would expect FTBFS bug reports if a
    run-of-the-mill dpkg-buildpackage command failed to build the package on
    such a system.

    (assuming for
    example a baseline for what dpkg-deb might require, plus build helpers
    and their interpreters, and what a compiler with say an empty C, C++
    or similar file might need, etc).

    +1 for taking a baseline into consideration, as the first worker is
    usually significantly more expensive than additional workers. In my
    experience with Java build processes the first worker penalty is in the vicinity of +35% and can be much higher for lighter build processes (but
    then they are lighter and less likely to hit a limit excepted on very constrained environments).

    Another thing I would like to add is that the requirements may change
    depending on the phase of the build, especially between building and
    testing. For larger projects, building requires usually more memory but
    less parallelism than testing. You could always throw more workers at
    building, but at some point additional workers will just sit mostly idle consuming RAM and resources as there is a limited number of tasks that
    the critical path will allow at any given point. Testing, especially
    with larger test suites, usually allows for (and sometimes needs) much
    more parallelism.

    Also worth noting, on some projects the time spent testing can be orders
    of magnitude greater than the time spent building.

    This could also imply alternatively or in addition, providing a tool
    or adding some querying logic in an existing tools (in the dpkg
    toolset)
    to gather that information which the packaging could use, or…

    Additional tooling may help a bit, but I think what would really help at
    that point would be to write and publish guidelines relevant to the
    technology being packaged, based on empirical evidence collected while
    fine tuning the build or packaging, and kept reasonably up-to-date (i.e.
    never more than 2-3 years old) with the current state of technologies
    and projects. Salsa (or other CI) pipelines could be instrumented to
    provide some data and once the guidelines cover a majority of packages
    you will have a better insight of what, if anything, needs to be done
    with the tooling.


    [1]: https://salsa.debian.org/jpd/gradle/-/blob/upgrade-to-8.11.1-wip/debian/rules#L49

    --
    Julien Plissonneau Duquène

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)