Description
When a virtual machine is being live migrated from one hypervisor to another, where both hypervisors are running FRR and participating in an EVPN fabric, the virtual machine orchestration on the target hypervisor adds static sticky bridge FDB and IP neighbour entries reflecting the new location of the virtual machine in the fabric. FRR on this hypervisor will eventually advertise MACIP routes for those into the EVPN fabric.
On the source hypervisor, the static sticky bridge FDB and IP neighbour entries are simultaneously being removed, causing FRR to withdraw its MACIP routes.
However, due to factors such as the propagation delay for the withdrawals of the MACIP routes it usually happens that the target hypervisor sees the MACIP route withdrawal from the source hypervisor some time after the static sticky FDB/neigh entries have been added locally. From what I can tell, FRR appears to act on this MACIP route withdrawal by removing the local FDB/neigh entries added by the virtual machine orchestration.
The virtual machine orchestration's reconciliation loop eventually notices that the FDB/neigh entries it added after the live migration have gone AWOL and adds them back, and things usually settles into a functional state. However the oscillating FDB/neigh entries adds quite a bit of delay to the VM live migration process, which is supposed to be nearly hitless.
Version
FRRouting 10.7.0-dev (frr1) on Linux(5.14.0-611.20.1.el9_7.x86_64).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
'--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-static' '--disable-werror' '--enable-multipath=256' '--enable-vtysh' '--enable-ospfclient' '--enable-ospfapi' '--enable-ldpd' '--enable-pimd' '--enable-pim6d' '--enable-pbrd' '--enable-nhrpd' '--enable-eigrpd' '--enable-babeld' '--enable-vrrpd' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-fpm' '--enable-watchfrr' '--disable-bgp-vnc' '--enable-isisd' '--enable-doc' '--enable-rpki' '--enable-bfdd' '--enable-pathd' '--disable-grpc' '--enable-snmp' '--disable-zeromq' '--enable-pcre2posix' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig' 'CC=gcc' 'CXX=g++' 'LT_SYS_LIBRARY_PATH=/usr/lib64:'
How to reproduce
Have a OpenStack deployment with (at least) two hypervisors connected with BGP to a L3 data centre fabric. Deploy EVPN on the hypervisors with an orchestration agent such as evpn_agent. Deploy a VM on a VLAN («provider network» in OpenStack nomenclature) that is connected to a VLAN-aware Linux Bridge with a L2VNI device connected and FRR set up to advertise all VNIs, ensuring there is EVPN-arranged L2 connectivity between the two hypervisors. The live migrate a VM from one hypervisor to another.
The FRR configuration in question is as follows:
frr version 10.7.0-dev
frr defaults datacenter
hostname frr1
log syslog
!
route-map LEAF-IN permit 1
set community no-export additive
exit
!
route-map vrf-2-redistribute-connected permit 99
match interface irb-99
exit
!
route-map vrf-2-redistribute-connected deny 65535
exit
!
debug zebra kernel
debug bgp updates in
!
vrf vrf-2
vni 2
exit-vrf
!
interface lo
ip address 10.0.0.1/32
exit
!
router bgp 65001
bgp router-id 10.0.0.1
bgp disable-ebgp-connected-route-check
bgp bestpath as-path multipath-relax
neighbor LEAF peer-group
neighbor LEAF remote-as external
neighbor eth4 interface peer-group LEAF
neighbor eth5 interface peer-group LEAF
use-underlays-nexthop-weight
!
address-family ipv4 unicast
network 10.0.0.1/32
redistribute kernel
redistribute connected
neighbor LEAF route-map LEAF-IN in
exit-address-family
!
address-family ipv6 unicast
redistribute kernel
redistribute connected
neighbor LEAF activate
neighbor LEAF route-map LEAF-IN in
exit-address-family
!
address-family l2vpn evpn
neighbor LEAF activate
neighbor LEAF route-map LEAF-IN in
advertise-all-vni
exit-address-family
exit
!
router bgp 65001 vrf vrf-2
no bgp default ipv4-unicast
bgp disable-ebgp-connected-route-check
bgp bestpath as-path multipath-relax
use-underlays-nexthop-weight
!
address-family ipv4 unicast
redistribute kernel
redistribute connected route-map vrf-2-redistribute-connected
exit-address-family
!
address-family ipv6 unicast
redistribute kernel
redistribute connected route-map vrf-2-redistribute-connected
exit-address-family
!
address-family l2vpn evpn
advertise ipv4 unicast
advertise ipv6 unicast
exit-address-family
exit
!
end
Expected behavior
When the orchestration agent on the target hypervisor adds the VM's FDB/neigh entries, FRR should pick up those and immediately start advertising MACIP routes for them. It should not delete them. When withdraws for MACIP routes advertised from the old hypervisor are received FRR, it should only delete the exact entries that this MACIP route previously caused to be installed, not all entries for the MAC and/or IP address (as those might have been added by some external process and should therefore be left alone).
Actual behavior
I'll go through the systemd journal (with line numbers added) from when a virtual machine is being migrated to a hypervisor and comment along the lines. The log contains the debug output from FRR, from the orchestration agent (evpn_agent.service) as well as from some custom units that run ip monitor neigh and bridge monitor fdb, as well as one unit (differ.service) that reports any relevant changes in the neigh cache or bridge fdb for the IP/MAC of the VM being migrated.
Line 1-49: The VM is started
1 Wed 2026-04-08 11:31:10 CEST systemd-machined.service[1123]: New machine qemu-388-instance-0000d1dc.
2 Wed 2026-04-08 11:31:10 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=1520, seq=0, pid=0
3 Wed 2026-04-08 11:31:10 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=1520, seq=0, pid=0
4 Wed 2026-04-08 11:31:10 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=1520, seq=0, pid=0
5 Wed 2026-04-08 11:31:10 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=1520, seq=0, pid=0
6 Wed 2026-04-08 11:31:10 CEST frr.service[3431682]: [K8FXY-V65ZJ] Intf dplane ctx 0x7f3fbc029e20, op INTF_INSTALL, ifindex (1870), result QUEUED
7 Wed 2026-04-08 11:31:10 CEST frr.service[3431682]: [WV1M1-5064M] RTM_NEWLINK update for tapd0ba6558-f4(1870) sl_type 4 master 0 flags 0x11043
8 Wed 2026-04-08 11:31:10 CEST frr.service[3431682]: [VJ8AQ-ZFT0K] Intf tapd0ba6558-f4(1870) PTM up, notifying clients is_up:1 pd_cleared:0
9 Wed 2026-04-08 11:31:10 CEST frr.service[3431682]: [K8FXY-V65ZJ] Intf dplane ctx 0x7f3fbc028250, op INTF_INSTALL, ifindex (1870), result QUEUED
10 Wed 2026-04-08 11:31:10 CEST frr.service[3431682]: [WV1M1-5064M] RTM_NEWLINK update for tapd0ba6558-f4(1870) sl_type 4 master 0 flags 0x11043
11 Wed 2026-04-08 11:31:10 CEST frr.service[3431682]: [VJ8AQ-ZFT0K] Intf tapd0ba6558-f4(1870) PTM up, notifying clients is_up:1 pd_cleared:0
12 Wed 2026-04-08 11:31:10 CEST frr.service[3431682]: [K8FXY-V65ZJ] Intf dplane ctx 0x7f3fbc027650, op INTF_INSTALL, ifindex (1870), result QUEUED
13 Wed 2026-04-08 11:31:10 CEST frr.service[3431682]: [WV1M1-5064M] RTM_NEWLINK update for tapd0ba6558-f4(1870) sl_type 4 master 0 flags 0x11043
14 Wed 2026-04-08 11:31:10 CEST frr.service[3431682]: [VJ8AQ-ZFT0K] Intf tapd0ba6558-f4(1870) PTM up, notifying clients is_up:1 pd_cleared:0
15 Wed 2026-04-08 11:31:10 CEST frr.service[3431682]: [K8FXY-V65ZJ] Intf dplane ctx 0x7f3fbc02bdf0, op INTF_INSTALL, ifindex (1870), result QUEUED
16 Wed 2026-04-08 11:31:10 CEST frr.service[3431682]: [WV1M1-5064M] RTM_NEWLINK update for tapd0ba6558-f4(1870) sl_type 4 master 0 flags 0x11043
17 Wed 2026-04-08 11:31:10 CEST frr.service[3431682]: [VJ8AQ-ZFT0K] Intf tapd0ba6558-f4(1870) PTM up, notifying clients is_up:1 pd_cleared:0
18 Wed 2026-04-08 11:31:11 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWADDR(20), len=72, seq=0, pid=0
19 Wed 2026-04-08 11:31:11 CEST frr.service[3431682]: [RGWF1-EHXT1] netlink_interface_addr_dplane: RTM_NEWADDR nsid 0 ifindex 1870 flags 0x80:
20 Wed 2026-04-08 11:31:11 CEST frr.service[3431682]: [ME3M2-X6YT9] IFA_ADDRESS fe80::fc16:3eff:fe62:3aa2/64
21 Wed 2026-04-08 11:31:11 CEST frr.service[3431682]: [P2VPT-508WP] IFA_CACHEINFO pref -1, valid -1
22 Wed 2026-04-08 11:31:11 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
23 Wed 2026-04-08 11:31:11 CEST frr.service[3431682]: [SKNFJ-G938V] RTM_NEWROUTE ipv6 local proto kernel NS 0
24 Wed 2026-04-08 11:31:11 CEST frr.service[3431682]: [J3J81-V75NW] Route rtm_type: local(2) intentionally ignoring
25 Wed 2026-04-08 11:31:11 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
26 Wed 2026-04-08 11:31:11 CEST frr.service[3431682]: [SKNFJ-G938V] RTM_NEWROUTE ipv6 anycast proto kernel NS 0
27 Wed 2026-04-08 11:31:11 CEST frr.service[3431682]: [J3J81-V75NW] Route rtm_type: anycast(4) intentionally ignoring
28 Wed 2026-04-08 11:31:11 CEST frr.service[3431682]: [K8FXY-V65ZJ] Intf dplane ctx 0x7f3fbc029e20, op INTF_ADDR_ADD, ifindex (1870), result QUEUED
29 Wed 2026-04-08 11:31:11 CEST frr.service[3431682]: [MZPZA-W042K] zebra_if_addr_update_ctx: INTF_ADDR_ADD: ifindex tapd0ba6558-f4(1870), addr fe80::fc16:3eff:fe62:3aa2/64
30 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=1512, seq=0, pid=0
31 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=1512, seq=0, pid=0
32 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=1520, seq=0, pid=0
33 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=1520, seq=0, pid=0
34 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=1520, seq=0, pid=0
35 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [K8FXY-V65ZJ] Intf dplane ctx 0x7f3fbc029e20, op INTF_INSTALL, ifindex (1870), result QUEUED
36 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [WV1M1-5064M] RTM_NEWLINK update for tapd0ba6558-f4(1870) sl_type 4 master 0 flags 0x11043
37 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [VJ8AQ-ZFT0K] Intf tapd0ba6558-f4(1870) PTM up, notifying clients is_up:1 pd_cleared:0
38 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [K8FXY-V65ZJ] Intf dplane ctx 0x7f3fbc028250, op INTF_INSTALL, ifindex (1870), result QUEUED
39 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [WV1M1-5064M] RTM_NEWLINK update for tapd0ba6558-f4(1870) sl_type 4 master 0 flags 0x11043
40 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [VJ8AQ-ZFT0K] Intf tapd0ba6558-f4(1870) PTM up, notifying clients is_up:1 pd_cleared:0
41 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [K8FXY-V65ZJ] Intf dplane ctx 0x7f3fbc029e20, op INTF_INSTALL, ifindex (1870), result QUEUED
42 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [WV1M1-5064M] RTM_NEWLINK update for tapd0ba6558-f4(1870) sl_type 4 master 0 flags 0x11043
43 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [VJ8AQ-ZFT0K] Intf tapd0ba6558-f4(1870) PTM up, notifying clients is_up:1 pd_cleared:0
44 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [K8FXY-V65ZJ] Intf dplane ctx 0x7f3fbc027650, op INTF_INSTALL, ifindex (1870), result QUEUED
45 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [WV1M1-5064M] RTM_NEWLINK update for tapd0ba6558-f4(1870) sl_type 4 master 0 flags 0x11043
46 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [VJ8AQ-ZFT0K] Intf tapd0ba6558-f4(1870) PTM up, notifying clients is_up:1 pd_cleared:0
47 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [K8FXY-V65ZJ] Intf dplane ctx 0x7f3fbc02bdf0, op INTF_INSTALL, ifindex (1870), result QUEUED
48 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [WV1M1-5064M] RTM_NEWLINK update for tapd0ba6558-f4(1870) sl_type 4 master 0 flags 0x11043
49 Wed 2026-04-08 11:31:12 CEST frr.service[3431682]: [VJ8AQ-ZFT0K] Intf tapd0ba6558-f4(1870) PTM up, notifying clients is_up:1 pd_cleared:0
Line 50-70: The orchestration (evpn_agent) configures the local FDB/neigh entries
50 Wed 2026-04-08 11:31:14 CEST evpn_agent.service[298104]: [bridgemanager.py:80 → ensure_fdb()] Adding static sticky FDB entry for fa:16:3e:7e:e3:5a on VLAN 99
51 Wed 2026-04-08 11:31:14 CEST bridge-monitor-fdb.service[3486074]: fa:16:3e:7e:e3:5a dev veth-to-ovs vlan 99 sticky master br-evpn static
52 Wed 2026-04-08 11:31:14 CEST ip-monitor-neigh.service[316985]: dev veth-to-ovs lladdr fa:16:3e:7e:e3:5a NOARP
53 Wed 2026-04-08 11:31:14 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWNEIGH(28), len=84, seq=0, pid=0
54 Wed 2026-04-08 11:31:14 CEST evpn_agent.service[298104]: [neighmanager.py:72 → ensure_neigh()] Adding static neigh entry 185.47.41.88→fa:16:3e:7e:e3:5a on irb-99
55 Wed 2026-04-08 11:31:14 CEST frr.service[3431682]: [HM5M4-AQPPX] Rx RTM_NEWNEIGH AF_BRIDGE IF 14 VLAN 99 st 0x40 fl 0x40 MAC fa:16:3e:7e:e3:5a nhg 0 vni 0
56 Wed 2026-04-08 11:31:14 CEST frr.service[3431682]: [JJ27M-21GWC][EC 4043309181] MAC fa:16:3e:7e:e3:5a already learnt as remote sticky MAC behind VTEP 10.0.0.2 VNI 10099
57 Wed 2026-04-08 11:31:14 CEST ip-monitor-neigh.service[316985]: 185.47.41.88 dev irb-99 lladdr fa:16:3e:7e:e3:5a PERMANENT proto 255
58 Wed 2026-04-08 11:31:14 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWNEIGH(28), len=84, seq=0, pid=1561122
59 Wed 2026-04-08 11:31:14 CEST frr.service[3431682]: [Y6ZNP-9NYSB] Rx RTM_NEWNEIGH family ipv4 IF 1687 NSID 0 IP 185.47.41.88 MAC fa:16:3e:7e:e3:5a state 0x80 flags 0x0 ext_flags 0x0, proto 255
60 Wed 2026-04-08 11:31:14 CEST differ.service[1561130]: --- /dev/fd/63 2026-04-08 11:31:14.634078970 +0200
61 Wed 2026-04-08 11:31:14 CEST differ.service[1561130]: +++ /dev/fd/62 2026-04-08 11:31:14.635078971 +0200
62 Wed 2026-04-08 11:31:14 CEST differ.service[1561130]: @@ -1 +1 @@
63 Wed 2026-04-08 11:31:14 CEST differ.service[1561130]: -185.47.41.88 dev irb-99 lladdr fa:16:3e:7e:e3:5a extern_learn NOARP proto zebra
64 Wed 2026-04-08 11:31:14 CEST differ.service[1561130]: +185.47.41.88 dev irb-99 lladdr fa:16:3e:7e:e3:5a PERMANENT proto 255
65 Wed 2026-04-08 11:31:14 CEST differ.service[1561136]: --- /dev/fd/63 2026-04-08 11:31:14.653078974 +0200
66 Wed 2026-04-08 11:31:14 CEST differ.service[1561136]: +++ /dev/fd/62 2026-04-08 11:31:14.653078974 +0200
67 Wed 2026-04-08 11:31:14 CEST differ.service[1561136]: @@ -1,2 +1,2 @@
68 Wed 2026-04-08 11:31:14 CEST differ.service[1561136]: -fa:16:3e:7e:e3:5a dev l2vni-10099 vlan 99 sticky master br-evpn static
69 Wed 2026-04-08 11:31:14 CEST differ.service[1561136]: +fa:16:3e:7e:e3:5a dev veth-to-ovs vlan 99 sticky master br-evpn static
70 Wed 2026-04-08 11:31:14 CEST differ.service[1561136]: fa:16:3e:7e:e3:5a dev l2vni-10099 dst 10.0.0.2 self sticky static
Line 71-96: FRR receives a BGP withdraw for the MACIP from the source hypervisor, removes all neigh entries
This is where the bug happens. FRR receives (as expected) withdraws for the MACIP routes from the source hypervisor where the VM used to run, but acts on that by not only deleting the neigh entries pointing to the source hypervisor, but also the local entries added by the orchestration on line 50-70. Now there is no neigh entry for the VM's IP (185.47.41.88) at all. However it would appear that the local bridge entry is left alone at this point.
71 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [PAPP6-VDAWM] eth5(l5-g4-osl2) rcvd UPDATE about RD 10.0.0.2:2 [2]:[0]:[48]:[fa:16:3e:7e:e3:5a]:[32]:[185.47.41.88] label 0 l2vpn evpn -- withdrawn
72 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [K7P35-81YTJ] default (0): Uninstalling EVPN [2]:[0]:[48]:[fa:16:3e:7e:e3:5a]:[32]:[185.47.41.88] route from VNI 10099 IP/MAC table
73 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [PAPP6-VDAWM] eth5(l5-g4-osl2) rcvd UPDATE about RD 10.0.0.2:2 [2]:[0]:[48]:[fa:16:3e:7e:e3:5a] label 0 l2vpn evpn -- withdrawn
74 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [K7P35-81YTJ] default (0): Uninstalling EVPN [2]:[0]:[48]:[fa:16:3e:7e:e3:5a] route from VNI 10099 IP/MAC table
75 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [PAPP6-VDAWM] eth4(l4-g4-osl2) rcvd UPDATE about RD 10.0.0.2:2 [2]:[0]:[48]:[fa:16:3e:7e:e3:5a]:[32]:[185.47.41.88] label 0 l2vpn evpn -- withdrawn
76 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [K7P35-81YTJ] default (0): Uninstalling EVPN [2]:[0]:[48]:[fa:16:3e:7e:e3:5a]:[32]:[185.47.41.88] route from VNI 10099 IP/MAC table
77 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [PAPP6-VDAWM] eth4(l4-g4-osl2) rcvd UPDATE about RD 10.0.0.2:2 [2]:[0]:[48]:[fa:16:3e:7e:e3:5a] label 0 l2vpn evpn -- withdrawn
78 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [K7P35-81YTJ] default (0): Uninstalling EVPN [2]:[0]:[48]:[fa:16:3e:7e:e3:5a] route from VNI 10099 IP/MAC table
79 Wed 2026-04-08 11:31:16 CEST ip-monitor-neigh.service[316985]: 185.47.41.88 dev irb-99 lladdr fa:16:3e:7e:e3:5a extern_learn NOARP proto zebra
80 Wed 2026-04-08 11:31:16 CEST ip-monitor-neigh.service[316985]: 185.47.41.88 dev irb-99 FAILED proto zebra
81 Wed 2026-04-08 11:31:16 CEST ip-monitor-neigh.service[316985]: Deleted 185.47.41.88 dev irb-99 FAILED proto zebra
82 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [NH6N7-54CD1] Tx RTM_NEWNEIGH family ipv4 IF irb-99(1687) Neigh 185.47.41.88 MAC fa:16:3e:7e:e3:5a flags 0x10 state 0x40 ext_flags 0x0
83 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [NH6N7-54CD1] Tx RTM_DELNEIGH family ipv4 IF irb-99(1687) Neigh 185.47.41.88 MAC null flags 0x10 state 0x0 ext_flags 0x0
84 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [MMX22-H2MY5] Tx RTM_DELNEIGH family bridge IF l2vni-10099(1685) VLAN 99 MAC fa:16:3e:7e:e3:5a dst 10.0.0.2 nhg 0 rem
85 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [HYEHE-CQZ9G] nl_batch_send: netlink-dp (NS 0), batch size=192, msg cnt=3
86 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [MQ5AP-2S1F5] netlink-dp (NS 0) error: No such file or directory, type=RTM_DELNEIGH(29), seq=149677680, pid=3631020340
87 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [QTT8V-3ZQ34] nl_batch_read_resp: netlink error message seq=149677680
88 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_DELNEIGH(29), len=72, seq=0, pid=0
89 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [GZB4H-6FWZE] Rx NEIGH_IP_DELETE family IPv4 IF irb-99(1687) vrf vrf-2(1686) IP 185.47.41.88
90 Wed 2026-04-08 11:31:16 CEST differ.service[1561611]: --- /dev/fd/63 2026-04-08 11:31:16.547079327 +0200
91 Wed 2026-04-08 11:31:16 CEST differ.service[1561611]: +++ /dev/fd/62 2026-04-08 11:31:16.548079327 +0200
92 Wed 2026-04-08 11:31:16 CEST differ.service[1561611]: @@ -1 +1 @@
93 Wed 2026-04-08 11:31:16 CEST differ.service[1561611]: -185.47.41.88 dev irb-99 lladdr fa:16:3e:7e:e3:5a PERMANENT proto 255
94 Wed 2026-04-08 11:31:16 CEST differ.service[1561611]: +
95 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [YXPF5-B2CE0] netlink_route_multipath_msg_encode: RTM_DELROUTE 185.47.41.88/32 vrf 1686(100002)
96 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [HYEHE-CQZ9G] nl_batch_send: netlink-dp (NS 0), batch size=52, msg cnt=1
Line 97-124: FRR receives a lingering MACIP route from old hypervisor and nukes local bridge entry
At this point BGP is still converging, so the withdrawn MACIP route from the old hypervisor is temporarily received (note the longer AS-path). At this point, FRR nukes the bridge FDB entry for the VM which was correctly pointing to the downstream Open vSwitch bridge where the VM is connected, replacing it with one pointing to the L2VNI with the old hypervisor as dst. The neigh entry that was removed during 71-96 is re-added, but with proto zebra and extern_learn.
97 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [XXWBM-V772F] eth5(l5-g4-osl2) rcvd UPDATE w/ attr: nexthop 10.0.0.2, extcommunity RT:12834:2 RT:12834:10099 Rmac:3a:83:7c:72:d4:89, path 65245 4282005242 65202 39029 65201 4282005241 65244 65002
98 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [YCKEM-GB33T] eth5(l5-g4-osl2) rcvd RD 10.0.0.2:2 [2]:[0]:[48]:[fa:16:3e:7e:e3:5a]:[32]:[185.47.41.88] label 10099/2 l2vpn evpn
99 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [WTJQB-H1EZQ] default (0): Installing EVPN [2]:[0]:[48]:[fa:16:3e:7e:e3:5a]:[32]:[185.47.41.88] route in VNI 10099 IP/MAC table
100 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [XXWBM-V772F] eth5(l5-g4-osl2) rcvd UPDATE w/ attr: nexthop 10.0.0.2, extcommunity RT:12834:10099 MM:0, sticky MAC, path 65245 4282005242 65202 39029 65201 4282005241 65244 65002
101 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [YCKEM-GB33T] eth5(l5-g4-osl2) rcvd RD 10.0.0.2:2 [2]:[0]:[48]:[fa:16:3e:7e:e3:5a] label 10099 l2vpn evpn
102 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [WTJQB-H1EZQ] default (0): Installing EVPN [2]:[0]:[48]:[fa:16:3e:7e:e3:5a] route in VNI 10099 IP/MAC table
103 Wed 2026-04-08 11:31:16 CEST ip-monitor-neigh.service[316985]: dev l2vni-10099 lladdr fa:16:3e:7e:e3:5a NOARP
104 Wed 2026-04-08 11:31:16 CEST ip-monitor-neigh.service[316985]: 185.47.41.88 dev irb-99 lladdr fa:16:3e:7e:e3:5a extern_learn NOARP proto zebra
105 Wed 2026-04-08 11:31:16 CEST bridge-monitor-fdb.service[3486074]: fa:16:3e:7e:e3:5a dev l2vni-10099 vlan 99 sticky master br-evpn static
106 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [MMX22-H2MY5] Tx RTM_NEWNEIGH family bridge IF l2vni-10099(1685) VLAN 99 sticky MAC fa:16:3e:7e:e3:5a dst 10.0.0.2 nhg 0 rem
107 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [NH6N7-54CD1] Tx RTM_NEWNEIGH family ipv4 IF irb-99(1687) Neigh 185.47.41.88 MAC fa:16:3e:7e:e3:5a flags 0x10 state 0x40 ext_flags 0x0
108 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [HYEHE-CQZ9G] nl_batch_send: netlink-dp (NS 0), batch size=136, msg cnt=2
109 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWNEIGH(28), len=84, seq=0, pid=0
110 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [HM5M4-AQPPX] Rx RTM_NEWNEIGH AF_BRIDGE IF 1685 VLAN 99 st 0x40 fl 0x40 MAC fa:16:3e:7e:e3:5a nhg 0 vni 0
111 Wed 2026-04-08 11:31:16 CEST differ.service[1561659]: --- /dev/fd/63 2026-04-08 11:31:16.741079363 +0200
112 Wed 2026-04-08 11:31:16 CEST differ.service[1561659]: +++ /dev/fd/62 2026-04-08 11:31:16.742079363 +0200
113 Wed 2026-04-08 11:31:16 CEST differ.service[1561659]: @@ -1,2 +1,2 @@
114 Wed 2026-04-08 11:31:16 CEST differ.service[1561659]: -fa:16:3e:7e:e3:5a dev veth-to-ovs vlan 99 sticky master br-evpn static
115 Wed 2026-04-08 11:31:16 CEST differ.service[1561659]: +fa:16:3e:7e:e3:5a dev l2vni-10099 vlan 99 sticky master br-evpn static
116 Wed 2026-04-08 11:31:16 CEST differ.service[1561659]: fa:16:3e:7e:e3:5a dev l2vni-10099 dst 10.0.0.2 self sticky static
117 Wed 2026-04-08 11:31:16 CEST differ.service[1561665]: --- /dev/fd/63 2026-04-08 11:31:16.756079366 +0200
118 Wed 2026-04-08 11:31:16 CEST differ.service[1561665]: +++ /dev/fd/62 2026-04-08 11:31:16.756079366 +0200
119 Wed 2026-04-08 11:31:16 CEST differ.service[1561665]: @@ -1 +1 @@
120 Wed 2026-04-08 11:31:16 CEST differ.service[1561665]: -
121 Wed 2026-04-08 11:31:16 CEST differ.service[1561665]: +185.47.41.88 dev irb-99 lladdr fa:16:3e:7e:e3:5a extern_learn NOARP proto zebra
122 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [YXPF5-B2CE0] netlink_route_multipath_msg_encode: RTM_NEWROUTE 185.47.41.88/32 vrf 1686(100002)
123 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [J87BH-XW5PP] netlink_route_multipath_msg_encode: 185.47.41.88/32 nhg_id is 8768438
124 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [HYEHE-CQZ9G] nl_batch_send: netlink-dp (NS 0), batch size=60, msg cnt=1
Line 125-156: BGP converges, the lingering MACIP route is removed for good
FRR deals with this by deleting the bridge FDB and IP neigh entries for the VM. At the end of this, there are no FDB/neigh entries for the VM whatsoever, neither remote nor local.
125 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [PAPP6-VDAWM] eth5(l5-g4-osl2) rcvd UPDATE about RD 10.0.0.2:2 [2]:[0]:[48]:[fa:16:3e:7e:e3:5a]:[32]:[185.47.41.88] label 0 l2vpn evpn -- withdrawn
126 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [K7P35-81YTJ] default (0): Uninstalling EVPN [2]:[0]:[48]:[fa:16:3e:7e:e3:5a]:[32]:[185.47.41.88] route from VNI 10099 IP/MAC table
127 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [PAPP6-VDAWM] eth5(l5-g4-osl2) rcvd UPDATE about RD 10.0.0.2:2 [2]:[0]:[48]:[fa:16:3e:7e:e3:5a] label 0 l2vpn evpn -- withdrawn
128 Wed 2026-04-08 11:31:16 CEST frr.service[3431687]: [K7P35-81YTJ] default (0): Uninstalling EVPN [2]:[0]:[48]:[fa:16:3e:7e:e3:5a] route from VNI 10099 IP/MAC table
129 Wed 2026-04-08 11:31:16 CEST ip-monitor-neigh.service[316985]: 185.47.41.88 dev irb-99 FAILED proto zebra
130 Wed 2026-04-08 11:31:16 CEST ip-monitor-neigh.service[316985]: Deleted 185.47.41.88 dev irb-99 FAILED proto zebra
131 Wed 2026-04-08 11:31:16 CEST ip-monitor-neigh.service[316985]: Deleted dev l2vni-10099 lladdr fa:16:3e:7e:e3:5a NOARP
132 Wed 2026-04-08 11:31:16 CEST ip-monitor-neigh.service[316985]: Deleted 10.0.0.2 dev l2vni-10099 lladdr fa:16:3e:7e:e3:5a REACHABLE NOARP
133 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [NH6N7-54CD1] Tx RTM_DELNEIGH family ipv4 IF irb-99(1687) Neigh 185.47.41.88 MAC null flags 0x10 state 0x0 ext_flags 0x0
134 Wed 2026-04-08 11:31:16 CEST bridge-monitor-fdb.service[3486074]: Deleted fa:16:3e:7e:e3:5a dev l2vni-10099 vlan 99 sticky master br-evpn static
135 Wed 2026-04-08 11:31:16 CEST bridge-monitor-fdb.service[3486074]: Deleted fa:16:3e:7e:e3:5a dev l2vni-10099 dst 10.0.0.2 self sticky static
136 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [MMX22-H2MY5] Tx RTM_DELNEIGH family bridge IF l2vni-10099(1685) VLAN 99 MAC fa:16:3e:7e:e3:5a dst 10.0.0.2 nhg 0 rem
137 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [HYEHE-CQZ9G] nl_batch_send: netlink-dp (NS 0), batch size=136, msg cnt=2
138 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [GZB4H-6FWZE] Rx NEIGH_IP_DELETE family IPv4 IF irb-99(1687) vrf vrf-2(1686) IP 185.47.41.88
139 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_DELNEIGH(29), len=72, seq=0, pid=0
140 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_DELNEIGH(29), len=84, seq=0, pid=0
141 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [HM5M4-AQPPX] Rx RTM_DELNEIGH AF_BRIDGE IF 1685 VLAN 99 st 0x40 fl 0x40 MAC fa:16:3e:7e:e3:5a nhg 0 vni 0
142 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_DELNEIGH(29), len=68, seq=0, pid=0
143 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [HM5M4-AQPPX] Rx RTM_DELNEIGH AF_BRIDGE IF 1685 st 0x42 fl 0x42 MAC fa:16:3e:7e:e3:5a dst 10.0.0.2 nhg 0 vni 0
144 Wed 2026-04-08 11:31:16 CEST differ.service[1561698]: --- /dev/fd/63 2026-04-08 11:31:16.887079390 +0200
145 Wed 2026-04-08 11:31:16 CEST differ.service[1561698]: +++ /dev/fd/62 2026-04-08 11:31:16.888079390 +0200
146 Wed 2026-04-08 11:31:16 CEST differ.service[1561698]: @@ -1 +1 @@
147 Wed 2026-04-08 11:31:16 CEST differ.service[1561698]: -185.47.41.88 dev irb-99 lladdr fa:16:3e:7e:e3:5a extern_learn NOARP proto zebra
148 Wed 2026-04-08 11:31:16 CEST differ.service[1561698]: +
149 Wed 2026-04-08 11:31:16 CEST differ.service[1561704]: --- /dev/fd/63 2026-04-08 11:31:16.904079393 +0200
150 Wed 2026-04-08 11:31:16 CEST differ.service[1561704]: +++ /dev/fd/62 2026-04-08 11:31:16.904079393 +0200
151 Wed 2026-04-08 11:31:16 CEST differ.service[1561704]: @@ -1,2 +1 @@
152 Wed 2026-04-08 11:31:16 CEST differ.service[1561704]: -fa:16:3e:7e:e3:5a dev l2vni-10099 vlan 99 sticky master br-evpn static
153 Wed 2026-04-08 11:31:16 CEST differ.service[1561704]: -fa:16:3e:7e:e3:5a dev l2vni-10099 dst 10.0.0.2 self sticky static
154 Wed 2026-04-08 11:31:16 CEST differ.service[1561704]: +
155 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [YXPF5-B2CE0] netlink_route_multipath_msg_encode: RTM_DELROUTE 185.47.41.88/32 vrf 1686(100002)
156 Wed 2026-04-08 11:31:16 CEST frr.service[3431682]: [HYEHE-CQZ9G] nl_batch_send: netlink-dp (NS 0), batch size=52, msg cnt=1
Line 163-189: FDB/neigh entries are learned dynamically by the kernel, MACIP advertised
(Lines 157-163 are unrelated BGP updates for a L2VNI which is not present on this hypervisor, so I cut those out.)
The VM in question is chatty, so it was learned by the kernel using regular L2 flooding-and-learning mechanisms. FRR picks that up and advertises the MACIP routes (as you can see the leaf switches loop those advertisements back which are ignored due to the AS-path loop prevention mechanism in BGP). After this, connectivity is restored.
163 Wed 2026-04-08 11:31:17 CEST ip-monitor-neigh.service[316985]: dev veth-to-ovs lladdr fa:16:3e:7e:e3:5a REACHABLE
164 Wed 2026-04-08 11:31:17 CEST ip-monitor-neigh.service[316985]: 185.47.41.88 dev irb-99 lladdr fa:16:3e:7e:e3:5a STALE
165 Wed 2026-04-08 11:31:17 CEST bridge-monitor-fdb.service[3486074]: fa:16:3e:7e:e3:5a dev veth-to-ovs vlan 99 master br-evpn
166 Wed 2026-04-08 11:31:17 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWNEIGH(28), len=84, seq=0, pid=0
167 Wed 2026-04-08 11:31:17 CEST frr.service[3431682]: [HM5M4-AQPPX] Rx RTM_NEWNEIGH AF_BRIDGE IF 14 VLAN 99 st 0x2 fl 0x0 MAC fa:16:3e:7e:e3:5a nhg 0 vni 0
168 Wed 2026-04-08 11:31:17 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWNEIGH(28), len=76, seq=0, pid=0
169 Wed 2026-04-08 11:31:17 CEST frr.service[3431682]: [Y6ZNP-9NYSB] Rx RTM_NEWNEIGH family ipv4 IF 1687 NSID 0 IP 185.47.41.88 MAC fa:16:3e:7e:e3:5a state 0x4 flags 0x0 ext_flags 0x0, proto 0
170 Wed 2026-04-08 11:31:17 CEST differ.service[1561835]: --- /dev/fd/63 2026-04-08 11:31:17.396079485 +0200
171 Wed 2026-04-08 11:31:17 CEST differ.service[1561835]: +++ /dev/fd/62 2026-04-08 11:31:17.396079485 +0200
172 Wed 2026-04-08 11:31:17 CEST differ.service[1561835]: @@ -1 +1 @@
173 Wed 2026-04-08 11:31:17 CEST differ.service[1561835]: -
174 Wed 2026-04-08 11:31:17 CEST differ.service[1561835]: +fa:16:3e:7e:e3:5a dev veth-to-ovs vlan 99 master br-evpn
175 Wed 2026-04-08 11:31:17 CEST differ.service[1561841]: --- /dev/fd/63 2026-04-08 11:31:17.411079488 +0200
176 Wed 2026-04-08 11:31:17 CEST differ.service[1561841]: +++ /dev/fd/62 2026-04-08 11:31:17.411079488 +0200
177 Wed 2026-04-08 11:31:17 CEST differ.service[1561841]: @@ -1 +1 @@
178 Wed 2026-04-08 11:31:17 CEST differ.service[1561841]: -
179 Wed 2026-04-08 11:31:17 CEST differ.service[1561841]: +185.47.41.88 dev irb-99 lladdr fa:16:3e:7e:e3:5a STALE
180 Wed 2026-04-08 11:31:17 CEST frr.service[3431687]: [XXWBM-V772F] eth5(l5-g4-osl2) rcvd UPDATE w/ attr: nexthop 10.0.0.1, extcommunity RT:12833:10099, path 65245 65001
181 Wed 2026-04-08 11:31:17 CEST frr.service[3431687]: [RZMGQ-A03CG] eth5(l5-g4-osl2) rcvd UPDATE about RD 10.0.0.1:2 [2]:[0]:[48]:[fa:16:3e:7e:e3:5a] label 10099 l2vpn evpn -- DENIED due to: as-path contains our own AS;
182 Wed 2026-04-08 11:31:17 CEST frr.service[3431687]: [NY4CB-3ZH8H] bgp_attr_ext_communities: router mac ea:a9:4d:11:1f:e2 is self mac
183 Wed 2026-04-08 11:31:17 CEST frr.service[3431687]: [XXWBM-V772F] eth5(l5-g4-osl2) rcvd UPDATE w/ attr: nexthop 10.0.0.1, extcommunity RT:12833:2 RT:12833:10099 Rmac:ea:a9:4d:11:1f:e2, path 65245 65001
184 Wed 2026-04-08 11:31:17 CEST frr.service[3431687]: [RZMGQ-A03CG] eth5(l5-g4-osl2) rcvd UPDATE about RD 10.0.0.1:2 [2]:[0]:[48]:[fa:16:3e:7e:e3:5a]:[32]:[185.47.41.88] label 10099/2 l2vpn evpn -- DENIED due to: as-path contains our own AS;
185 Wed 2026-04-08 11:31:17 CEST frr.service[3431687]: [XXWBM-V772F] eth4(l4-g4-osl2) rcvd UPDATE w/ attr: nexthop 10.0.0.1, extcommunity RT:12833:10099 ET:8, path 65244 65001
186 Wed 2026-04-08 11:31:17 CEST frr.service[3431687]: [RZMGQ-A03CG] eth4(l4-g4-osl2) rcvd UPDATE about RD 10.0.0.1:2 [2]:[0]:[48]:[fa:16:3e:7e:e3:5a] label 10099 l2vpn evpn -- DENIED due to: as-path contains our own AS;
187 Wed 2026-04-08 11:31:17 CEST frr.service[3431687]: [NY4CB-3ZH8H] bgp_attr_ext_communities: router mac ea:a9:4d:11:1f:e2 is self mac
188 Wed 2026-04-08 11:31:17 CEST frr.service[3431687]: [XXWBM-V772F] eth4(l4-g4-osl2) rcvd UPDATE w/ attr: nexthop 10.0.0.1, extcommunity RT:12833:2 RT:12833:10099 ET:8 Rmac:ea:a9:4d:11:1f:e2, path 65244 65001
189 Wed 2026-04-08 11:31:17 CEST frr.service[3431687]: [RZMGQ-A03CG] eth4(l4-g4-osl2) rcvd UPDATE about RD 10.0.0.1:2 [2]:[0]:[48]:[fa:16:3e:7e:e3:5a]:[32]:[185.47.41.88] label 10099/2 l2vpn evpn -- DENIED due to: as-path contains our own AS;
Line 190-212: The orchestration agent realises the static/sticky entries are AWOL and fixes that
Since the FDB/neigh entries for the VM now are dynamic, not static/sticky as intended, the agent fixes them again, so they are correct. This causes FRR to advertise the MACIP routes again, now with «sticky MAC» community. At this point things have settled completely.
190 Wed 2026-04-08 11:31:18 CEST evpn_agent.service[298104]: [bridgemanager.py:80 → ensure_fdb()] Adding static sticky FDB entry for fa:16:3e:7e:e3:5a on VLAN 99
191 Wed 2026-04-08 11:31:18 CEST ip-monitor-neigh.service[316985]: dev veth-to-ovs lladdr fa:16:3e:7e:e3:5a NOARP
192 Wed 2026-04-08 11:31:18 CEST bridge-monitor-fdb.service[3486074]: fa:16:3e:7e:e3:5a dev veth-to-ovs vlan 99 sticky master br-evpn static
193 Wed 2026-04-08 11:31:18 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWNEIGH(28), len=84, seq=0, pid=0
194 Wed 2026-04-08 11:31:18 CEST evpn_agent.service[298104]: [neighmanager.py:72 → ensure_neigh()] Adding static neigh entry 185.47.41.88→fa:16:3e:7e:e3:5a on irb-99
195 Wed 2026-04-08 11:31:18 CEST frr.service[3431682]: [HM5M4-AQPPX] Rx RTM_NEWNEIGH AF_BRIDGE IF 14 VLAN 99 st 0x40 fl 0x40 MAC fa:16:3e:7e:e3:5a nhg 0 vni 0
196 Wed 2026-04-08 11:31:18 CEST ip-monitor-neigh.service[316985]: 185.47.41.88 dev irb-99 lladdr fa:16:3e:7e:e3:5a PERMANENT proto 255
197 Wed 2026-04-08 11:31:18 CEST frr.service[3431682]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWNEIGH(28), len=84, seq=0, pid=1562086
198 Wed 2026-04-08 11:31:18 CEST frr.service[3431682]: [Y6ZNP-9NYSB] Rx RTM_NEWNEIGH family ipv4 IF 1687 NSID 0 IP 185.47.41.88 MAC fa:16:3e:7e:e3:5a state 0x80 flags 0x0 ext_flags 0x0, proto 255
199 Wed 2026-04-08 11:31:18 CEST differ.service[1562094]: --- /dev/fd/63 2026-04-08 11:31:18.446079680 +0200
200 Wed 2026-04-08 11:31:18 CEST differ.service[1562094]: +++ /dev/fd/62 2026-04-08 11:31:18.446079680 +0200
201 Wed 2026-04-08 11:31:18 CEST differ.service[1562094]: @@ -1 +1 @@
202 Wed 2026-04-08 11:31:18 CEST differ.service[1562094]: -fa:16:3e:7e:e3:5a dev veth-to-ovs vlan 99 master br-evpn
203 Wed 2026-04-08 11:31:18 CEST differ.service[1562094]: +fa:16:3e:7e:e3:5a dev veth-to-ovs vlan 99 sticky master br-evpn static
204 Wed 2026-04-08 11:31:18 CEST differ.service[1562100]: --- /dev/fd/63 2026-04-08 11:31:18.461079683 +0200
205 Wed 2026-04-08 11:31:18 CEST differ.service[1562100]: +++ /dev/fd/62 2026-04-08 11:31:18.461079683 +0200
206 Wed 2026-04-08 11:31:18 CEST differ.service[1562100]: @@ -1 +1 @@
207 Wed 2026-04-08 11:31:18 CEST differ.service[1562100]: -185.47.41.88 dev irb-99 lladdr fa:16:3e:7e:e3:5a STALE
208 Wed 2026-04-08 11:31:18 CEST differ.service[1562100]: +185.47.41.88 dev irb-99 lladdr fa:16:3e:7e:e3:5a PERMANENT proto 255
209 Wed 2026-04-08 11:31:18 CEST frr.service[3431687]: [XXWBM-V772F] eth5(l5-g4-osl2) rcvd UPDATE w/ attr: nexthop 10.0.0.1, extcommunity RT:12833:10099 MM:0, sticky MAC, path 65245 65001
210 Wed 2026-04-08 11:31:18 CEST frr.service[3431687]: [RZMGQ-A03CG] eth5(l5-g4-osl2) rcvd UPDATE about RD 10.0.0.1:2 [2]:[0]:[48]:[fa:16:3e:7e:e3:5a] label 10099 l2vpn evpn -- DENIED due to: as-path contains our own AS;
211 Wed 2026-04-08 11:31:18 CEST frr.service[3431687]: [XXWBM-V772F] eth4(l4-g4-osl2) rcvd UPDATE w/ attr: nexthop 10.0.0.1, extcommunity RT:12833:10099 ET:8 MM:0, sticky MAC, path 65244 65001
212 Wed 2026-04-08 11:31:18 CEST frr.service[3431687]: [RZMGQ-A03CG] eth4(l4-g4-osl2) rcvd UPDATE about RD 10.0.0.1:2 [2]:[0]:[48]:[fa:16:3e:7e:e3:5a] label 10099 l2vpn evpn -- DENIED due to: as-path contains our own AS;
Additional context
Looking at the above log, the correct FDB/neigh entries for the VM were added (and seen by FRR) at 11:31:14. However it was not until three seconds later, at 11:31:17 that the MACIP routes for the VM were advertised.
This means that the outage caused by the live migration (which is meant to be nearly hitless) was significantly longer than it would have had to be, if FRR only had advertised the MACIP routes at 11:31:14.
Checklist
Description
When a virtual machine is being live migrated from one hypervisor to another, where both hypervisors are running FRR and participating in an EVPN fabric, the virtual machine orchestration on the target hypervisor adds static sticky bridge FDB and IP neighbour entries reflecting the new location of the virtual machine in the fabric. FRR on this hypervisor will eventually advertise MACIP routes for those into the EVPN fabric.
On the source hypervisor, the static sticky bridge FDB and IP neighbour entries are simultaneously being removed, causing FRR to withdraw its MACIP routes.
However, due to factors such as the propagation delay for the withdrawals of the MACIP routes it usually happens that the target hypervisor sees the MACIP route withdrawal from the source hypervisor some time after the static sticky FDB/neigh entries have been added locally. From what I can tell, FRR appears to act on this MACIP route withdrawal by removing the local FDB/neigh entries added by the virtual machine orchestration.
The virtual machine orchestration's reconciliation loop eventually notices that the FDB/neigh entries it added after the live migration have gone AWOL and adds them back, and things usually settles into a functional state. However the oscillating FDB/neigh entries adds quite a bit of delay to the VM live migration process, which is supposed to be nearly hitless.
Version
How to reproduce
Have a OpenStack deployment with (at least) two hypervisors connected with BGP to a L3 data centre fabric. Deploy EVPN on the hypervisors with an orchestration agent such as evpn_agent. Deploy a VM on a VLAN («provider network» in OpenStack nomenclature) that is connected to a VLAN-aware Linux Bridge with a L2VNI device connected and FRR set up to advertise all VNIs, ensuring there is EVPN-arranged L2 connectivity between the two hypervisors. The live migrate a VM from one hypervisor to another.
The FRR configuration in question is as follows:
Expected behavior
When the orchestration agent on the target hypervisor adds the VM's FDB/neigh entries, FRR should pick up those and immediately start advertising MACIP routes for them. It should not delete them. When withdraws for MACIP routes advertised from the old hypervisor are received FRR, it should only delete the exact entries that this MACIP route previously caused to be installed, not all entries for the MAC and/or IP address (as those might have been added by some external process and should therefore be left alone).
Actual behavior
I'll go through the systemd journal (with line numbers added) from when a virtual machine is being migrated to a hypervisor and comment along the lines. The log contains the debug output from FRR, from the orchestration agent (evpn_agent.service) as well as from some custom units that run
ip monitor neighandbridge monitor fdb, as well as one unit (differ.service) that reports any relevant changes in the neigh cache or bridge fdb for the IP/MAC of the VM being migrated.Line 1-49: The VM is started
Line 50-70: The orchestration (evpn_agent) configures the local FDB/neigh entries
Line 71-96: FRR receives a BGP withdraw for the MACIP from the source hypervisor, removes all neigh entries
This is where the bug happens. FRR receives (as expected) withdraws for the MACIP routes from the source hypervisor where the VM used to run, but acts on that by not only deleting the neigh entries pointing to the source hypervisor, but also the local entries added by the orchestration on line 50-70. Now there is no neigh entry for the VM's IP (185.47.41.88) at all. However it would appear that the local bridge entry is left alone at this point.
Line 97-124: FRR receives a lingering MACIP route from old hypervisor and nukes local bridge entry
At this point BGP is still converging, so the withdrawn MACIP route from the old hypervisor is temporarily received (note the longer AS-path). At this point, FRR nukes the bridge FDB entry for the VM which was correctly pointing to the downstream Open vSwitch bridge where the VM is connected, replacing it with one pointing to the L2VNI with the old hypervisor as
dst. The neigh entry that was removed during 71-96 is re-added, but withproto zebraandextern_learn.Line 125-156: BGP converges, the lingering MACIP route is removed for good
FRR deals with this by deleting the bridge FDB and IP neigh entries for the VM. At the end of this, there are no FDB/neigh entries for the VM whatsoever, neither remote nor local.
Line 163-189: FDB/neigh entries are learned dynamically by the kernel, MACIP advertised
(Lines 157-163 are unrelated BGP updates for a L2VNI which is not present on this hypervisor, so I cut those out.)
The VM in question is chatty, so it was learned by the kernel using regular L2 flooding-and-learning mechanisms. FRR picks that up and advertises the MACIP routes (as you can see the leaf switches loop those advertisements back which are ignored due to the AS-path loop prevention mechanism in BGP). After this, connectivity is restored.
Line 190-212: The orchestration agent realises the static/sticky entries are AWOL and fixes that
Since the FDB/neigh entries for the VM now are dynamic, not static/sticky as intended, the agent fixes them again, so they are correct. This causes FRR to advertise the MACIP routes again, now with «sticky MAC» community. At this point things have settled completely.
Additional context
Looking at the above log, the correct FDB/neigh entries for the VM were added (and seen by FRR) at 11:31:14. However it was not until three seconds later, at 11:31:17 that the MACIP routes for the VM were advertised.
This means that the outage caused by the live migration (which is meant to be nearly hitless) was significantly longer than it would have had to be, if FRR only had advertised the MACIP routes at 11:31:14.
Checklist