Skip to content

[Mellanox][SmartSwitch] Convert sonic-bfb-installer.sh to python#26871

Open
judsonwilson-nvidia wants to merge 1 commit intosonic-net:masterfrom
judsonwilson-nvidia:judsonw-python-bfb-installer--community
Open

[Mellanox][SmartSwitch] Convert sonic-bfb-installer.sh to python#26871
judsonwilson-nvidia wants to merge 1 commit intosonic-net:masterfrom
judsonwilson-nvidia:judsonw-python-bfb-installer--community

Conversation

@judsonwilson-nvidia
Copy link
Copy Markdown
Contributor

Depends on the following PRs which must be merged and submodules synced:
(Mellanox builds will fail in python unit tests without these changes.)


Convert the Mellanox platform's sonic-bfb-installer.sh script (used in SmartSwitch) to a python implementation, which is now run with the sonic-bfb-installer command (no ".sh" suffix).

Why I did it

This change was made for better integration with the platform python libraries, and other advantages that python brings.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Converted all of the functionality to python, with the help of AI (cursor). Functionality should be extremely similar, with some slight but small improvements (e.g. better error messages).

How to verify it

Manually test the functionality.

Which release branch to backport (provide reason below if selected)

  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Tested branch (Please provide the tested image version)

master

Description for the changelog

[Mellanox] On SmartSwitch, the sonic-bfb-installer.sh command is deprecated and replaced with the new sonic-bfb-installer command (no .sh suffix).

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Convert the Mellanox platform's sonic-bfb-installer.sh script (used in
SmartSwitch) to a python implementation, which is now run with the
sonic-bfb-installer command (no ".sh" suffix).

#### Why I did it

This change was made for better integration with the platform python
libraries, and other advantages that python brings.

##### Work item tracking

- Microsoft ADO **(number only)**:

#### How I did it

Converted all of the functionality to python, with the help of AI
(cursor). Functionality should be extremely similar, with some slight
but small improvements (e.g. better error messages). 

#### How to verify it

Manually test the functionality.

#### Which release branch to backport (provide reason below if selected)

- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [ ] 202511

#### Tested branch (Please provide the tested image version)

master

#### Description for the changelog

[Mellanox] On SmartSwitch, the `sonic-bfb-installer.sh` command is
deprecated and replaced with the new `sonic-bfb-installer` command (no
`.sh` suffix).

#### Link to config_db schema for YANG module changes



Signed-off-by: Judson Wilson <judsonw@nvidia.com>
Copilot AI review requested due to automatic review settings April 17, 2026 09:00
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the Mellanox SmartSwitch sonic-bfb-installer implementation from a large shell script to a Python-based CLI shipped as part of mellanox-platform-utils, while keeping the legacy sonic-bfb-installer.sh as a thin wrapper for backwards compatibility.

Changes:

  • Replace platform/mellanox/sonic-bfb-installer.sh logic with an exec wrapper that calls the new Python sonic-bfb-installer console script.
  • Add a new mellanox_bfb_installer Python package implementing installer functionality (device selection, PCI detection, rshim control, BFB file handling, install execution, DPU reset).
  • Add unit tests and coverage configuration for the new package; update build dependencies to include sonic-utilities and add a dpuctlplat logger hook.

Reviewed changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
platform/mellanox/sonic-bfb-installer.sh Deprecation wrapper that forwards to the new Python CLI.
platform/mellanox/platform-utils/platform-utils.mk Adds SONIC_UTILITIES_PY3 dependency for new installer imports.
platform/mellanox/platform-utils/setup.py Conditionally packages/installs mellanox_bfb_installer and registers sonic-bfb-installer entrypoint (non-bluefield).
platform/mellanox/platform-utils/pytest.ini Adds pythonpath/import-mode and coverage for mellanox_bfb_installer.
platform/mellanox/platform-utils/mellanox_bfb_installer/init.py New package marker.
platform/mellanox/platform-utils/mellanox_bfb_installer/main.py New Click CLI entrypoint, logging/locking, config augmentation, orchestration.
platform/mellanox/platform-utils/mellanox_bfb_installer/device_selection.py Parses user selection (--dpu / deprecated --rshim) and builds per-target install plan.
platform/mellanox/platform-utils/mellanox_bfb_installer/platform_dpu.py Platform validation, DPU↔rshim mapping, PCI detection, CX7 removal.
platform/mellanox/platform-utils/mellanox_bfb_installer/rshim_daemon.py Starts/stops rshim daemon and waits for /dev/<rshim>/boot.
platform/mellanox/platform-utils/mellanox_bfb_installer/bfb_file.py URL download, tar extraction, and SHA256 validation for BFB images.
platform/mellanox/platform-utils/mellanox_bfb_installer/bfb_install_core.py Core install flow for a single target: rshim daemon + image delivery + reset.
platform/mellanox/platform-utils/mellanox_bfb_installer/reset_dpu.py DPU reset logic integrating DpuCtlPlat and optional ModuleHelper.
platform/mellanox/platform-utils/mellanox_bfb_installer/install_executor.py Parallel execution helper with signal-based child PID cleanup.
platform/mellanox/platform-utils/mellanox_bfb_installer/pyproject.toml Black config for the new package.
platform/mellanox/platform-utils/tests/pyproject.toml Black config for tests.
platform/mellanox/platform-utils/tests/mellanox_bfb_installer/conftest.py Skips bfb-installer tests on nvidia-bluefield configured platform.
platform/mellanox/platform-utils/tests/mellanox_bfb_installer/test_rshim_daemon.py Unit tests for rshim daemon helpers.
platform/mellanox/platform-utils/tests/mellanox_bfb_installer/test_reset_dpu.py Unit tests for DPU reset/transition-wait logic.
platform/mellanox/platform-utils/tests/mellanox_bfb_installer/test_platform_dpu.py Unit tests for platform validation + PCI detection + CX7 removal.
platform/mellanox/platform-utils/tests/mellanox_bfb_installer/test_main.py Unit tests for logging/lock/usage/config augmentation behaviors.
platform/mellanox/platform-utils/tests/mellanox_bfb_installer/test_install_executor.py Unit tests for parallel executor + signal handler behavior.
platform/mellanox/platform-utils/tests/mellanox_bfb_installer/test_device_selection.py Unit tests for selection parsing and config validation.
platform/mellanox/platform-utils/tests/mellanox_bfb_installer/test_bfb_install_core.py Unit tests for single-device install core flow and command building.
platform/mellanox/platform-utils/tests/mellanox_bfb_installer/test_bfb_file.py Unit tests for download/extract/sha validation behavior.
platform/mellanox/mlnx-platform-api/sonic_platform/dpuctlplat.py Adds use_logger() hook to replace SysLogger with standard logging.Logger.

Comment on lines +149 to +150
"""For each newly encountered config file, create a temp copy with original contents plus 'line 1' and 'line 2'; update targets to use the new paths."""
processed_configs: Dict[str, str] = {}
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring mentions adding literal "line 1"/"line 2", but the function actually appends the dynamically generated config lines (e.g., NPU_TIME). Also, processed_configs is typed as Dict[str, str] but config_path can be None, so the key type should reflect Optional[str] (or the code should normalize None to a string key).

Suggested change
"""For each newly encountered config file, create a temp copy with original contents plus 'line 1' and 'line 2'; update targets to use the new paths."""
processed_configs: Dict[str, str] = {}
"""For each newly encountered config file, create a temp copy with original contents plus the generated additional config lines; update targets to use the new paths."""
processed_configs: Dict[Optional[str], str] = {}

Copilot uses AI. Check for mistakes.


def _user_dpu_selection_to_dpus_from_platform_json(
dpus: str, rshims: str, script_name: str, print_usage_callback: Callable[[], None]
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type hints for _user_dpu_selection_to_dpus_from_platform_json declare dpus/rshims as str, but the function explicitly handles None and is called with Optional[str]. Updating the signature to accept Optional[str] avoids misleading API contracts and improves readability.

Suggested change
dpus: str, rshims: str, script_name: str, print_usage_callback: Callable[[], None]
dpus: Optional[str], rshims: Optional[str], script_name: str, print_usage_callback: Callable[[], None]

Copilot uses AI. Check for mistakes.

def _parse_config_paths(
configs: Optional[str], num_dpus: int, user_selected_all_dpus: bool
) -> List[str]:
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_parse_config_paths is annotated to return List[str], but it returns [None] * num_dpus when configs is None. This should be List[Optional[str]] (and the callers/TargetInfo construction should match) to keep types accurate.

Suggested change
) -> List[str]:
) -> List[Optional[str]]:

Copilot uses AI. Check for mistakes.
Comment on lines +136 to +138
lspci_out = _run_lspci_d_n()
detected_bus_ids: Dict[str, Dict[DpuInterfaceEnum, str]] = {}
for line in lspci_out.splitlines():
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detected_bus_ids is declared as Dict[str, Dict[DpuInterfaceEnum, str]], but the keys written are strings (DpuInterfaceEnum.*.value). This mismatch can confuse future readers and type checkers; align the annotation with the actual key type.

Copilot uses AI. Check for mistakes.
import unittest
from unittest import mock

sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test prepends .../tests to sys.path (via ..), which is not the package root (platform-utils). That can lead to confusing import resolution (e.g., implicit namespace package mellanox_bfb_installer coming from the tests directory). Consider removing this sys.path tweak and relying on pytest.ini's pythonpath = ., or adjust it to point to the actual package root (e.g., ../..).

Suggested change
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "../..")))

Copilot uses AI. Check for mistakes.

import pytest

sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test prepends .../tests to sys.path (via ..), which is not the package root (platform-utils). That can lead to confusing import resolution (e.g., implicit namespace package mellanox_bfb_installer coming from the tests directory). Consider removing this sys.path tweak and relying on pytest.ini's pythonpath = ., or adjust it to point to the actual package root (e.g., ../..).

Suggested change
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")))

Copilot uses AI. Check for mistakes.
import unittest
from unittest import mock

sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test prepends .../tests to sys.path (via ..), which is not the package root (platform-utils). That can lead to confusing import resolution (e.g., implicit namespace package mellanox_bfb_installer coming from the tests directory). Consider removing this sys.path tweak and relying on pytest.ini's pythonpath = ., or adjust it to point to the actual package root (e.g., ../..).

Suggested change
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
PACKAGE_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), "../.."))
if PACKAGE_ROOT not in sys.path:
sys.path.insert(0, PACKAGE_ROOT)

Copilot uses AI. Check for mistakes.
import unittest
from unittest import mock

sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test prepends .../tests to sys.path (via ..), which is not the package root (platform-utils). That can lead to confusing import resolution (e.g., implicit namespace package mellanox_bfb_installer coming from the tests directory). Consider removing this sys.path tweak and relying on pytest.ini's pythonpath = ., or adjust it to point to the actual package root (e.g., ../..).

Suggested change
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "../.."))

Copilot uses AI. Check for mistakes.
Comment on lines +55 to +56
except FileNotFoundError:
logger.error("the curl command failed with: 1")
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On FileNotFoundError (curl not installed), the code logs a hard-coded "failed with: 1" which looks like an exit code rather than the actual error. Logging that curl is missing (and/or including the exception details) would produce a more accurate and actionable error message.

Suggested change
except FileNotFoundError:
logger.error("the curl command failed with: 1")
except FileNotFoundError as err:
logger.error("curl command not found: %s", err)

Copilot uses AI. Check for mistakes.
Comment on lines +105 to +118
def wait_for_rshim_boot(rshim: str) -> bool:
"""Poll /dev/{rshim}/boot for up to 10 seconds.

Returns True if boot file appeared, False otherwise.
"""
boot_path = f"/dev/{rshim}/boot"
timeout = BOOT_WAIT_TIMEOUT_SEC
while timeout > 0:
if os.path.exists(boot_path):
return True
time.sleep(1)
timeout -= 1
logger.error("%s: Error: Boot file did not appear after 10 seconds", rshim)
return False
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says "up to 10 seconds" and the error message hard-codes 10 seconds, but the timeout is controlled by BOOT_WAIT_TIMEOUT_SEC. Using the constant in the docstring/log message keeps them correct if the timeout is ever changed.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants