• Bug#1108309: mdanalysis: FTBFS randomly: autobuilder hangs

    From Drew Parsons@1:229/2 to All on Thu Jul 3 14:20:01 2025
    From: dparsons@debian.org

    Source: mdanalysis
    Followup-For: Bug #1108309
    Control: tags -1 ftbfs

    Yeah, mdanalysis is generally flakey.
    Not a whole lot we can do about it except run the tests again till they pass, or keep skipping tests until the probability of failure is sufficiently reduced.

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Santiago Vila@21:1/5 to All on Fri Jul 4 19:30:01 2025
    found 1108309 2.9.0-9
    tags 1108309 patch
    thanks

    Hi. My autobuilders still hang when trying to build this package.
    All the time, as in "tried 50 times and failed 50 times".
    I've put recent build logs here:

    https://people.debian.org/~sanvila/build-logs/202507/

    Those build logs were produced using machines with 2 CPUs.
    When using machines with one CPU, the failure rate is only 10%.

    Some of the existing PRs here:

    https://github.com/MDAnalysis/mdanalysis/pulls

    read like "enable parallelization on XXX" or "enable parallelization on YYY".

    Also, in page 2 of Issues:

    https://github.com/MDAnalysis/mdanalysis/issues?page=2

    there is a number of them saying "Implement parallelization or mark as unparallelizable".


    I think this is an upstream bug. Maybe the tests are buggy in
    a way that they need a minimum number of CPUs to run (probably
    in an unintended way), and such number is unfortunately strictly
    greater than 2. If that's the case, we (Debian) should be honest
    and skip those tests when we know for sure that they will fail
    (as in the (untested) attached patch).

    But maybe I'm over-diagnosing this, and many of the functions that have
    been modified "so that they run in parallel" do not really work ok
    in parallel.

    Can you please forward this upstream? (Or give me some hint about
    how I should word the issue myself, as I have a github account).

    Thanks.

    --- a/debian/rules
    +++ b/debian/rules
    @@ -120,13 +120,13 @@ execute_after_dh_auto_clean:
    ifeq (yes,$(findstring yes,$(RUNTEST)))
    override_dh_auto_test:
    set -e; \
    - for py in $(shell py3versions -rv); do \
    + if [ $(nproc) -gt 2 ]; then for py in $(shell py3versions -rv); do \
    echo "=== testing with python$$py ==="; \
    pydir=`pybuild -p $$py --system=distutils --print {build_dir}`; \
    MPLBACKEND=agg PYTHONPATH=$$pydir python$$py -mpytest -v -k "$(SKIP_TESTS)" --disable-pytest-warnings testsuite; \
    rm -rf $$pydir/MDAnalysis/.hypothesis; \
    rm -rf $$pydir/MDAnalysis/.duecredit.p; \
    - done
    + done; fi
    endif

    execute_after_dh_python3:

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Drew Parsons@21:1/5 to All on Sun Jul 6 16:20:02 2025
    Source: mdanalysis
    Followup-For: Bug #1108309

    Thanks Santiago, I appreciate your help improving the package.
    Certainly good for everyone if we can find how to stop these random timeouts.

    Drew

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Santiago Vila@21:1/5 to All on Tue Jul 8 03:20:01 2025
    forwarded 1108309 https://github.com/MDAnalysis/mdanalysis/issues/5078
    thanks

    Hi. I've forwarded this to the above URL.
    Thanks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Drew Parsons@21:1/5 to Santiago Vila on Thu Jul 10 10:50:01 2025
    Source: mdanalysis
    Followup-For: Bug #1108309
    X-Debbugs-Cc: 1108997@bugs.debian.org
    Control: tags -1 ftbfs

    Santiago Vila wrote:
    But the build logs from buildd.debian.org which I quoted in my initial report, where the build fails with timeout:
    ...
    happened on machines with more than 2 CPUs, so by skipping the tests
    if the number of CPUs is <= 2, we are certainly avoiding the problem
    in some scenarios where we know the tests are quite useless, but not
    in the buildds.

    So, to summarize, I don't think it was a good suggestion, and I'm sorry
    that I realized now.


    Ok, thanks for clarifying, Santiago.

    In that case it's better for trixie to just skip all tests. That's
    gotten done now with a debci ban, but that doesn't let us test any
    movement from upstream (the ban prevents testing new code in
    experimental).

    So I'll upload -12 switching off tests for trixie.

    Drew

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)