Skip to content

[lldp] Replace per-port portidsubtype loop with global ifname directive#26873

Open
ZhaohuiS wants to merge 2 commits intosonic-net:masterfrom
ZhaohuiS:fix/lldp-portidsubtype-ifname
Open

[lldp] Replace per-port portidsubtype loop with global ifname directive#26873
ZhaohuiS wants to merge 2 commits intosonic-net:masterfrom
ZhaohuiS:fix/lldp-portidsubtype-ifname

Conversation

@ZhaohuiS
Copy link
Copy Markdown
Contributor

@ZhaohuiS ZhaohuiS commented Apr 17, 2026

Description of PR

Summary:
Replace the per-port portidsubtype configuration loop from PR #26144 with a single global configure lldp portidsubtype ifname directive.

Fixes #26568

Background: Why portidsubtype matters

Without any portidsubtype configuration, lldpd uses MAC address as the default Port ID. When lldpd auto-resumes after config processing, the first LLDP frame carries MAC-based Port IDs. Peers then see a transient MSAP change when lldpmgrd later reconfigures portidsubtype to local (alias), causing unnecessary neighbor flaps.

PR #26144 solved this by adding a Jinja2 loop generating per-port configure ports <name> lldp portidsubtype local <alias> lines in lldpd.conf. This worked fine on low-port-count platforms.

The problem on high-port-count platforms

On platforms with many ports (e.g., Mellanox SN5640 with 512 ports), PR #26144 generates 512+ config lines. Processing these lines blocks lldpd for 11+ seconds at startup. During this blackout:

  • Neighbors with 30-second TTL begin timing out
  • When lldpd resumes, it sends frames with new MSAPs, causing neighbor table churn
  • The churn cascades through lldp_syncd, creating a feedback loop that never stabilizes
  • Result: LLDP neighbor count oscillates indefinitely (observed 161→166→163→171 after 4+ minutes)

The fix: global portidsubtype ifname

The single global directive configure lldp portidsubtype ifname tells lldpd to use the Linux interface name (e.g., Ethernet0) as Port ID for all ports, instead of MAC address. This achieves the same goal as the per-port loop (no MAC-as-Port-ID) with O(1) config processing time.

lldpmgrd still runs later to set the final portidsubtype to local + alias (e.g., etp1a) per port — this is the existing designed behavior and works correctly.

Key trade-off: The first LLDP frame after restart uses the interface name (Ethernet0) instead of the alias (etp9a). When lldpmgrd starts and reconfigures, there is a one-time MSAP transition per port. This is acceptable because:

  1. LLDP is an informational protocol — no data plane impact
  2. The transition is atomic and completes within seconds
  3. The alternative (batch per-port config) causes the same 512-line processing blackout

Type of change

  • Bug fix

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

On high-port-count platforms (SN5640, 512 ports), the per-port portidsubtype loop from PR #26144 blocks lldpd startup for 11+ seconds, causing LLDP neighbor churn and cascading instability that never converges.

How did you do it?

Replaced the per-port Jinja2 loop (N lines) with a single global directive (1 line):

configure lldp portidsubtype ifname

How did you verify/test it?

A/B tested on Mellanox SN5640 (str5-sn5640-6, topology t0-isolated-d256u256s2, 259 VM neighbors):

Configuration Config Lines Convergence Stable at 259? Deletes
PR #26144 (per-port loop) 519 never stable 74+ ongoing
No PR #26144 (baseline) 5 14s 0
This PR (global ifname) 6 ~30s 0
Batch per-port in waitfor_lldp_ready.sh 6 + 512 batch never stable oscillating

The global ifname approach was tested across multiple restart cycles with zero neighbor drops.

Any platform specific information?

Primarily affects high-port-count platforms (SN5640 with 512 ports). Lower port count platforms were not affected by the original issue but benefit from the simpler config.

Supported testbed topology if it's a new test case?

N/A (not a new test case)

Documentation

N/A

PR sonic-net#26144 added per-port portidsubtype configuration in lldpd.conf.j2 to
prevent transient MAC address as Port ID during lldpd startup. However,
on high-port-count platforms (e.g. Mellanox SN5640 with 512 ports), this
generates 512 config lines that take ~11 seconds to process, causing:
1. LLDP blackout during startup (no frames sent for 11+ seconds)
2. Neighbor table instability with continuous add/delete churn
3. Cascading force-repopulate storms in lldp_syncd

Replace the O(N) per-port loop with a single O(1) global directive:
  configure lldp portidsubtype ifname

This sets the default Port ID subtype to Interface Name (subtype 5),
so the first LLDP frame uses the Linux interface name (e.g. Ethernet0)
instead of MAC address. lldpmgrd still runs later to configure the
final portidsubtype to local+alias per port.

Verified on Mellanox SN5640 (512 ports, t0-isolated-d256u256s2 topology):
- Config lines: 519 -> 6 (removed 513 per-port lines)
- Convergence: never stable -> 259/259 in ~30 seconds, rock solid
- Neighbor deletes: 74+ -> 0
- First frame Port ID: Subtype Interface Name (5): Ethernet0 (not MAC)

Fixes sonic-net#26568

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
Copilot AI review requested due to automatic review settings April 17, 2026 10:46
@ZhaohuiS ZhaohuiS requested a review from lguohan as a code owner April 17, 2026 10:46
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes LLDP daemon startup configuration in SONiC’s docker-lldp container by replacing a per-port portidsubtype configuration loop with a single global portidsubtype ifname directive to avoid startup-time scaling issues on high-port-count platforms.

Changes:

  • Removed the Jinja2 loop that emitted configure ports <port> lldp portidsubtype local ... for every front-panel port.
  • Added a single global configure lldp portidsubtype ifname directive to prevent MAC-based Port IDs in the first LLDP frames while keeping config load O(1).

Comment thread dockers/docker-lldp/lldpd.conf.j2 Outdated
Comment on lines +35 to +39
neighbor churn and cascading force-repopulate storms in lldp_syncd.
The single global directive achieves the same goal (no MAC-as-Port-ID)
with O(1) config processing time. lldpmgrd still runs later to set the
final portidsubtype to local+alias per port. #}
configure lldp portidsubtype ifname
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This template change will break sonic-config-engine pytest golden-file comparisons: test_j2files.py renders dockers/docker-lldp/lldpd.conf.j2 and compares it to the sample_output lldpd-*.conf files, which currently expect only per-port configure ports ... portidsubtype local ... lines and do not include the new global configure lldp portidsubtype ifname directive. Please update the corresponding golden outputs (py2 + py3) via a sonic-config-engine submodule bump (preferred) or otherwise keep the unit tests in sync with the new rendered output.

Copilot uses AI. Check for mistakes.
@ZhaohuiS ZhaohuiS force-pushed the fix/lldp-portidsubtype-ifname branch from 99f60e3 to 295c408 Compare April 17, 2026 12:03
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copilot AI review requested due to automatic review settings April 17, 2026 12:34
@ZhaohuiS ZhaohuiS force-pushed the fix/lldp-portidsubtype-ifname branch from 295c408 to 005fdaa Compare April 17, 2026 12:34
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

Move the detailed explanation of the portidsubtype change from the
Jinja2 comment to the PR description. Keep only a one-line comment
in the code.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
@ZhaohuiS ZhaohuiS force-pushed the fix/lldp-portidsubtype-ifname branch from 51c024a to aa3506d Compare April 17, 2026 23:58
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: LLDP neighbor table flaps on high-scale systems

3 participants