• Bug#1100907: netplan.io: flaky autopkgtest (mostly on amd64 and s390x):

    From =?UTF-8?Q?Lukas_M=C3=A4rdian?=@21:1/5 to elbrus@debian.org on Wed Apr 9 12:20:01 2025
    On Thu, 20 Mar 2025 11:13:52 +0100 Paul Gevers <elbrus@debian.org> wrote:
    I looked at the results of the autopkgtest of your package because it
    showed up in the regressions for glibc. I noticed that it regularly
    fails on ci.d.n, at least on amd64 and s390x.

    Because the unstable-to-testing migration software now blocks on
    regressions in testing, flaky tests, i.e. tests that flip between
    passing and failing without changes to the list of installed packages,
    are causing people unrelated to your package to spend time on these
    tests.

    Don't hesitate to reach out if you need help and some more information
    from our infrastructure.
    Thanks Paul for reaching out!

    This race condition is really hard for me to reproduce locally, as it
    barely happens on my machines. But I might have identified the root-cause
    and a potential fix: https://github.com/canonical/netplan/pull/550/commits/7fbd0fc4aef7ef1ed


    When a new veth device is created via iproute2 ("ip link add ..."), the kernel assigns a MAC address. Once udev picks up the interface, it will apply systemd "MACAddressPolicy=permanent" default policy. But as veths cannot (by nature) have a permanent MAC, systemd-udev will create a random (but persistent) MAC, changing the kernel MAC.

    Previously we waited for 0.1 sec to (hopefully) pick up the later system-udev MAC, but sometimes that wasn't enough, and we still got the earlier kernel MAC.

    My patch drops the "time.sleep(0.1)" and instead triggers our test interfaces and waits for settlement of those uevents via 'udevadm', to avoid race conditions.


    ... Currently testing in experimental on real DebCI infrastructure.


    Cheers,
    Lukas

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)