• Bridging Network Connections with libvirt are unreliable

    From Rainer Dorsch@21:1/5 to All on Wed Aug 28 10:10:01 2024
    Hello,

    I have a (for me) weird problem on a bookworm system

    rd@h370:~$ inxi -S
    System:
    Host: h370 Kernel: 6.1.0-23-amd64 arch: x86_64 bits: 64 Desktop: KDE Plasma
    v: 5.27.5 Distro: Debian GNU/Linux 12 (bookworm)
    rd@h370:~$

    It uses bridging network connections with libvirt work unreliable.

    I have in /etc/network/interface bridging networks e.g.

    iface eno1.2 inet manual

    # libvirt VM
    auto br2
    iface br2 inet dhcp
    # Use the MAC address identified above.
    hwaddress ether 18:31:bf:52:1b:1c
    bridge_ports eno1.2
    # If you want to turn on Spanning Tree Protocol, ask your hosting
    # provider first as it may conflict with their network.
    bridge_stp off
    # If STP is off, set to 0. If STP is on, set to 2 (or greater).
    bridge_fd 0

    to make the interface available for libvirt.

    In addition there are non-bridging networks, e.g.

    allow-hotplug eno1.4
    iface eno1.4 inet dhcp

    All of them share the same physical network but defined separate VLANs.

    The full /etc/network/interface file of the machine is here https:// bokomoko.de/~rd/Debian/interfaces

    That works well for many hours or even days, but at some point in time the network is suddenly gone, and all network services die.

    root@h370:~# ifdown br2

    and

    root@h370:~# ifup br2

    heals the issue immediately. The non-bridging networks don't see the problem. The problem occurs independently of libvirt running or not.

    In the systemd log, the first entry indicating network problems is that the DNS server switches to another interface. But it could easily be a consequence and not the cause of the issue:

    Aug 28 06:57:54 h370 dhclient[1195]: DHCPREQUEST for 192.168.4.203 on eno1.4
    to 192.168.4.1 port 67
    Aug 28 06:57:54 h370 dhclient[1195]: DHCPACK of 192.168.4.203 from 192.168.4.1 Aug 28 06:57:54 h370 dnsmasq[2386]: reading /etc/resolv.conf
    Aug 28 06:57:54 h370 dnsmasq[2386]: using nameserver 192.168.4.1#53
    Aug 28 06:57:54 h370 dhclient[1195]: bound to 192.168.4.203 -- renewal in
    18265 seconds.

    As a workaround I could probably write a small script, which pings another network host and restarts the br interfaces, but I would prefer to understand why the problem occurs at the first place.

    Any idea or hint is welcome.

    Many thanks
    Rainer

    --
    Rainer Dorsch
    http://bokomoko.de/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Woodall@21:1/5 to Rainer Dorsch on Thu Aug 29 20:40:01 2024
    On Wed, 28 Aug 2024, Rainer Dorsch wrote:

    In the systemd log, the first entry indicating network problems is that the DNS
    server switches to another interface. But it could easily be a consequence and
    not the cause of the issue:

    Aug 28 06:57:54 h370 dhclient[1195]: DHCPREQUEST for 192.168.4.203 on eno1.4 to 192.168.4.1 port 67
    Aug 28 06:57:54 h370 dhclient[1195]: DHCPACK of 192.168.4.203 from 192.168.4.1
    Aug 28 06:57:54 h370 dnsmasq[2386]: reading /etc/resolv.conf
    Aug 28 06:57:54 h370 dnsmasq[2386]: using nameserver 192.168.4.1#53
    Aug 28 06:57:54 h370 dhclient[1195]: bound to 192.168.4.203 -- renewal in 18265 seconds.


    To me that looks like it's the DHCP request(renewal?) that is more
    likely breaking things. The DHCP server is presumably rewriting
    resolv.conf.

    I have the following setting to stop dhcp changing resolv.conf:

    $ cat /etc/dhcp/dhclient-enter-hooks.d/nodnsupdate
    make_resolv_conf() {
    :
    }

    Don't know if that will fix your problem but it should hopefully stop
    those dnsmasq lines appearing in the log.

    Does the problem definitely happen when the dhcp update happens or are
    these just the nearest logs?

    Tim.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rainer Dorsch@21:1/5 to All on Fri Aug 30 18:50:01 2024
    Am Donnerstag, 29. August 2024, 20:31:10 CEST schrieb Tim Woodall:
    On Wed, 28 Aug 2024, Rainer Dorsch wrote:
    In the systemd log, the first entry indicating network problems is that
    the DNS server switches to another interface. But it could easily be a consequence and not the cause of the issue:

    Aug 28 06:57:54 h370 dhclient[1195]: DHCPREQUEST for 192.168.4.203 on eno1.4 to 192.168.4.1 port 67
    Aug 28 06:57:54 h370 dhclient[1195]: DHCPACK of 192.168.4.203 from 192.168.4.1 Aug 28 06:57:54 h370 dnsmasq[2386]: reading /etc/resolv.conf Aug 28 06:57:54 h370 dnsmasq[2386]: using nameserver 192.168.4.1#53
    Aug 28 06:57:54 h370 dhclient[1195]: bound to 192.168.4.203 -- renewal in 18265 seconds.

    To me that looks like it's the DHCP request(renewal?) that is more
    likely breaking things. The DHCP server is presumably rewriting
    resolv.conf.

    I have the following setting to stop dhcp changing resolv.conf:

    $ cat /etc/dhcp/dhclient-enter-hooks.d/nodnsupdate
    make_resolv_conf() {

    }

    Don't know if that will fix your problem but it should hopefully stop
    those dnsmasq lines appearing in the log.

    Does the problem definitely happen when the dhcp update happens or are
    these just the nearest logs?

    Many thanks for your reply. I added the nodnsupdate configuration you suggested. But I should see now, if the problem comes back (unfortunately if happens in very irregular intervals). Do I need to restart a service that the change becomes effective?

    I cannot tell if it happens when the dhcp update happens or if this was just a coincident (or if the network issue even triggered a dns update?). I can tell though that it by far happens not for every dhcp update, there are many more
    of them in the log. Therefore at least something else must happen as well.

    I see a number of active dhclients though

    root 772 0.0 0.0 5872 3148 ? Ss Aug26 0:00 dhclient -4 -v -i -pf /run/dhclient.eno1.pid -lf /var/lib/dhcp/dhclient.eno1.leases -I -df /var/lib/dhcp/dhclient6.eno1.leases eno1
    root 1114 0.0 0.0 5872 3524 ? Ss Aug26 0:00 dhclient -4 -v -i -pf /run/dhclient.eno1.3.pid -lf /var/lib/dhcp/dhclient.eno1.3.leases -I -df /var/lib/dhcp/dhclient6.eno1.3.leases eno1.3
    root 1195 0.0 0.0 5872 3572 ? Ss Aug26 0:00 dhclient -4 -v -i -pf /run/dhclient.eno1.4.pid -lf /var/lib/dhcp/dhclient.eno1.4.leases -I -df /var/lib/dhcp/dhclient6.eno1.4.leases eno1.4
    root 1268 0.0 0.0 5868 3428 ? Ss Aug26 0:00 dhclient -4 -v -i -pf /run/dhclient.eno1.6.pid -lf /var/lib/dhcp/dhclient.eno1.6.leases -I -df /var/lib/dhcp/dhclient6.eno1.6.leases eno1.6
    root 377797 0.0 0.0 5848 3380 ? Ss Aug28 0:00 dhclient -4 -v -i -pf /run/dhclient.br7.pid -lf /var/lib/dhcp/dhclient.br7.leases -I -df / var/lib/dhcp/dhclient6.br7.leases br7
    root 378009 0.0 0.0 5848 3560 ? Ss Aug28 0:00 dhclient -4 -v -i -pf /run/dhclient.br2.pid -lf /var/lib/dhcp/dhclient.br2.leases -I -df / var/lib/dhcp/dhclient6.br2.leases br2
    root 378210 0.0 0.0 5848 3516 ? Ss Aug28 0:00 dhclient -4 -v -i -pf /run/dhclient.br5.pid -lf /var/lib/dhcp/dhclient.br5.leases -I -df / var/lib/dhcp/dhclient6.br5.leases br5

    Many thanks again
    Rainer

    --
    Rainer Dorsch
    http://bokomoko.de/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rainer Dorsch@21:1/5 to All on Fri Aug 30 19:00:01 2024
    Am Freitag, 30. August 2024, 01:13:40 CEST schrieb Jeffrey Walton:
    On Wed, Aug 28, 2024 at 4:06 AM Rainer Dorsch <ml@bokomoko.de> wrote:
    Hello,

    I have a (for me) weird problem on a bookworm system

    rd@h370:~$ inxi -S

    System:
    Host: h370 Kernel: 6.1.0-23-amd64 arch: x86_64 bits: 64 Desktop: KDE
    Plasma

    v: 5.27.5 Distro: Debian GNU/Linux 12 (bookworm)

    rd@h370:~$

    It uses bridging network connections with libvirt work unreliable.

    I have in /etc/network/interface bridging networks e.g.

    iface eno1.2 inet manual

    # libvirt VM
    auto br2
    iface br2 inet dhcp

    # Use the MAC address identified above.
    hwaddress ether 18:31:bf:52:1b:1c
    bridge_ports eno1.2
    # If you want to turn on Spanning Tree Protocol, ask your hosting
    # provider first as it may conflict with their network.
    bridge_stp off
    # If STP is off, set to 0. If STP is on, set to 2 (or greater).
    bridge_fd 0

    to make the interface available for libvirt.

    In addition there are non-bridging networks, e.g.

    allow-hotplug eno1.4
    iface eno1.4 inet dhcp

    All of them share the same physical network but defined separate VLANs.

    The full /etc/network/interface file of the machine is here https:// bokomoko.de/~rd/Debian/interfaces

    That works well for many hours or even days, but at some point in time the network is suddenly gone, and all network services die.

    root@h370:~# ifdown br2

    and

    root@h370:~# ifup br2

    heals the issue immediately. The non-bridging networks don't see the problem. The problem occurs independently of libvirt running or not.

    In the systemd log, the first entry indicating network problems is that
    the DNS server switches to another interface. But it could easily be a consequence and not the cause of the issue:

    Aug 28 06:57:54 h370 dhclient[1195]: DHCPREQUEST for 192.168.4.203 on eno1.4 to 192.168.4.1 port 67
    Aug 28 06:57:54 h370 dhclient[1195]: DHCPACK of 192.168.4.203 from 192.168.4.1 Aug 28 06:57:54 h370 dnsmasq[2386]: reading /etc/resolv.conf Aug 28 06:57:54 h370 dnsmasq[2386]: using nameserver 192.168.4.1#53
    Aug 28 06:57:54 h370 dhclient[1195]: bound to 192.168.4.203 -- renewal in 18265 seconds.

    As a workaround I could probably write a small script, which pings another network host and restarts the br interfaces, but I would prefer to understand why the problem occurs at the first place.

    Any idea or hint is welcome.

    Do you know if MAC Address Randomization is happening on your interfaces?

    Hi Jeff,

    many thanks for your reply.

    I am not aware that I configured address randomization.

    Just checking right now the output of inxi

    root@h370:~# inxi -n
    Network:
    Device-1: Intel Ethernet I219-V driver: e1000e
    IF: eno1 state: up speed: 1000 Mbps duplex: full mac: 18:31:bf:52:1b:1c
    IF-ID-1: br2 state: up speed: 1000 Mbps duplex: unknown mac: 18:31:bf:52:1b: 1c
    IF-ID-2: br5 state: up speed: 1000 Mbps duplex: unknown mac: 18:31:bf:52:1b: 1c
    IF-ID-3: br7 state: up speed: 1000 Mbps duplex: unknown mac: 18:31:bf:52:1b: 1c
    IF-ID-4: docker0 state: down mac: 02:42:5a:3f:a7:55
    IF-ID-5: eno1.2 state: up speed: 1000 Mbps duplex: full mac: 18:31:bf:52:1b: 1c
    IF-ID-6: eno1.3 state: up speed: 1000 Mbps duplex: full mac: 18:31:bf:52:1b: 1c
    IF-ID-7: eno1.4 state: up speed: 1000 Mbps duplex: full mac: 18:31:bf:52:1b: 1c
    IF-ID-8: eno1.5 state: up speed: 1000 Mbps duplex: full mac: 18:31:bf:52:1b: 1c
    IF-ID-9: eno1.6 state: up speed: 1000 Mbps duplex: full mac: 18:31:bf:52:1b: 1c
    IF-ID-10: eno1.7 state: up speed: 1000 Mbps duplex: full mac: 18:31:bf: 52:1b:1c
    IF-ID-11: eno1.99 state: up speed: 1000 Mbps duplex: full mac: 18:31:bf: 52:1b:1c
    IF-ID-12: virbr0 state: down mac: 52:54:00:79:ce:77
    root@h370:~#

    shows a mac address of 18:31:bf:52:1b:1c (which is the same as ifconfig reports).

    In a note which is years old, I found in the dmidecode output this MAC address in the UUID encoded:

    System Information
    Manufacturer: System manufacturer
    Product Name: System Product Name
    Version: System Version
    Serial Number: System Serial Number
    UUID: 9c815dee-28d8-5276-d202-1831bf521b1c
    Wake-up Type: Power Switch
    SKU Number: ASUS_MB_CNL
    Family: To be filled by O.E.M.

    For me that means there is no address randomization used. At least it would run very infrequently :-).

    Thanks again
    Rainer


    --
    Rainer Dorsch
    http://bokomoko.de/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)