Forum: >>> Magnum BBS <<<

Bug#1108309: mdanalysis: FTBFS randomly: autobuilder hangs

From Drew Parsons@1:229/2 to All on Thu Jul 3 14:20:01 2025

From: dparsons@debian.org

Source: mdanalysis
Followup-For: Bug #1108309
Control: tags -1 ftbfs

Yeah, mdanalysis is generally flakey.
Not a whole lot we can do about it except run the tests again till they pass, or keep skipping tests until the probability of failure is sufficiently reduced.

--- SoupGate-Win32 v1.05
* Origin: you cannot sedate... all the things you hate (1:229/2)

From Santiago Vila@21:1/5 to All on Fri Jul 4 19:30:01 2025

found 1108309 2.9.0-9
tags 1108309 patch
thanks

Hi. My autobuilders still hang when trying to build this package.
All the time, as in "tried 50 times and failed 50 times".
I've put recent build logs here:

https://people.debian.org/~sanvila/build-logs/202507/

Those build logs were produced using machines with 2 CPUs.
When using machines with one CPU, the failure rate is only 10%.

Some of the existing PRs here:

https://github.com/MDAnalysis/mdanalysis/pulls

read like "enable parallelization on XXX" or "enable parallelization on YYY".

Also, in page 2 of Issues:

https://github.com/MDAnalysis/mdanalysis/issues?page=2

there is a number of them saying "Implement parallelization or mark as unparallelizable".

I think this is an upstream bug. Maybe the tests are buggy in
a way that they need a minimum number of CPUs to run (probably
in an unintended way), and such number is unfortunately strictly
greater than 2. If that's the case, we (Debian) should be honest
and skip those tests when we know for sure that they will fail
(as in the (untested) attached patch).

But maybe I'm over-diagnosing this, and many of the functions that have
been modified "so that they run in parallel" do not really work ok
in parallel.

Can you please forward this upstream? (Or give me some hint about
how I should word the issue myself, as I have a github account).

Thanks.

--- a/debian/rules
+++ b/debian/rules
@@ -120,13 +120,13 @@ execute_after_dh_auto_clean:
ifeq (yes,$(findstring yes,$(RUNTEST)))
override_dh_auto_test:
set -e; \
- for py in $(shell py3versions -rv); do \
+ if [ $(nproc) -gt 2 ]; then for py in $(shell py3versions -rv); do \
echo "=== testing with python$$py ==="; \
pydir=`pybuild -p $$py --system=distutils --print {build_dir}`; \
MPLBACKEND=agg PYTHONPATH=$$pydir python$$py -mpytest -v -k "$(SKIP_TESTS)" --disable-pytest-warnings testsuite; \
rm -rf $$pydir/MDAnalysis/.hypothesis; \
rm -rf $$pydir/MDAnalysis/.duecredit.p; \
- done
+ done; fi
endif

execute_after_dh_python3:

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Drew Parsons@21:1/5 to All on Sun Jul 6 16:20:02 2025

Source: mdanalysis
Followup-For: Bug #1108309

Thanks Santiago, I appreciate your help improving the package.
Certainly good for everyone if we can find how to stop these random timeouts.

Drew

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Santiago Vila@21:1/5 to All on Tue Jul 8 03:20:01 2025

forwarded 1108309 https://github.com/MDAnalysis/mdanalysis/issues/5078
thanks

Hi. I've forwarded this to the above URL.
Thanks.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Drew Parsons@21:1/5 to Santiago Vila on Thu Jul 10 10:50:01 2025

Source: mdanalysis
Followup-For: Bug #1108309
X-Debbugs-Cc: 1108997@bugs.debian.org
Control: tags -1 ftbfs

Santiago Vila wrote:

But the build logs from buildd.debian.org which I quoted in my initial report, where the build fails with timeout:
...
happened on machines with more than 2 CPUs, so by skipping the tests
if the number of CPUs is <= 2, we are certainly avoiding the problem
in some scenarios where we know the tests are quite useless, but not
in the buildds.

So, to summarize, I don't think it was a good suggestion, and I'm sorry
that I realized now.

Ok, thanks for clarifying, Santiago.

In that case it's better for trixie to just skip all tests. That's
gotten done now with a debci ban, but that doesn't let us test any
movement from upstream (the ban prevents testing new code in
experimental).

So I'll upload -12 switching off tests for trixie.

Drew

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Thlc
  Sat Sep 13 17:11:34 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 17:04:03 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 16:32:19 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 15:41:11 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 07:56:03 2025
  from Rognac, France via SSH
- Gretchiie
  Sat Sep 13 07:22:10 2025
  from Derry, Nh via Telnet
- Thlc
  Sat Sep 13 06:57:56 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 06:47:28 2025
  from Rognac, France via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	147:05:54
Calls:	10,383
Calls today:	8
Files:	14,054
D/L today:	2 files (1,861K bytes)
Messages:	6,417,724

Bug#1108309: mdanalysis: FTBFS randomly: autobuilder hangs

Who's Online

Recent Visitors

System Info