[Mellanox][SmartSwitch] Convert sonic-bfb-installer.sh to python#26871
[Mellanox][SmartSwitch] Convert sonic-bfb-installer.sh to python#26871judsonwilson-nvidia wants to merge 1 commit intosonic-net:masterfrom
Conversation
Convert the Mellanox platform's sonic-bfb-installer.sh script (used in SmartSwitch) to a python implementation, which is now run with the sonic-bfb-installer command (no ".sh" suffix). #### Why I did it This change was made for better integration with the platform python libraries, and other advantages that python brings. ##### Work item tracking - Microsoft ADO **(number only)**: #### How I did it Converted all of the functionality to python, with the help of AI (cursor). Functionality should be extremely similar, with some slight but small improvements (e.g. better error messages). #### How to verify it Manually test the functionality. #### Which release branch to backport (provide reason below if selected) - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [ ] 202511 #### Tested branch (Please provide the tested image version) master #### Description for the changelog [Mellanox] On SmartSwitch, the `sonic-bfb-installer.sh` command is deprecated and replaced with the new `sonic-bfb-installer` command (no `.sh` suffix). #### Link to config_db schema for YANG module changes Signed-off-by: Judson Wilson <judsonw@nvidia.com>
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
There was a problem hiding this comment.
Pull request overview
This PR migrates the Mellanox SmartSwitch sonic-bfb-installer implementation from a large shell script to a Python-based CLI shipped as part of mellanox-platform-utils, while keeping the legacy sonic-bfb-installer.sh as a thin wrapper for backwards compatibility.
Changes:
- Replace
platform/mellanox/sonic-bfb-installer.shlogic with an exec wrapper that calls the new Pythonsonic-bfb-installerconsole script. - Add a new
mellanox_bfb_installerPython package implementing installer functionality (device selection, PCI detection, rshim control, BFB file handling, install execution, DPU reset). - Add unit tests and coverage configuration for the new package; update build dependencies to include
sonic-utilitiesand add adpuctlplatlogger hook.
Reviewed changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| platform/mellanox/sonic-bfb-installer.sh | Deprecation wrapper that forwards to the new Python CLI. |
| platform/mellanox/platform-utils/platform-utils.mk | Adds SONIC_UTILITIES_PY3 dependency for new installer imports. |
| platform/mellanox/platform-utils/setup.py | Conditionally packages/installs mellanox_bfb_installer and registers sonic-bfb-installer entrypoint (non-bluefield). |
| platform/mellanox/platform-utils/pytest.ini | Adds pythonpath/import-mode and coverage for mellanox_bfb_installer. |
| platform/mellanox/platform-utils/mellanox_bfb_installer/init.py | New package marker. |
| platform/mellanox/platform-utils/mellanox_bfb_installer/main.py | New Click CLI entrypoint, logging/locking, config augmentation, orchestration. |
| platform/mellanox/platform-utils/mellanox_bfb_installer/device_selection.py | Parses user selection (--dpu / deprecated --rshim) and builds per-target install plan. |
| platform/mellanox/platform-utils/mellanox_bfb_installer/platform_dpu.py | Platform validation, DPU↔rshim mapping, PCI detection, CX7 removal. |
| platform/mellanox/platform-utils/mellanox_bfb_installer/rshim_daemon.py | Starts/stops rshim daemon and waits for /dev/<rshim>/boot. |
| platform/mellanox/platform-utils/mellanox_bfb_installer/bfb_file.py | URL download, tar extraction, and SHA256 validation for BFB images. |
| platform/mellanox/platform-utils/mellanox_bfb_installer/bfb_install_core.py | Core install flow for a single target: rshim daemon + image delivery + reset. |
| platform/mellanox/platform-utils/mellanox_bfb_installer/reset_dpu.py | DPU reset logic integrating DpuCtlPlat and optional ModuleHelper. |
| platform/mellanox/platform-utils/mellanox_bfb_installer/install_executor.py | Parallel execution helper with signal-based child PID cleanup. |
| platform/mellanox/platform-utils/mellanox_bfb_installer/pyproject.toml | Black config for the new package. |
| platform/mellanox/platform-utils/tests/pyproject.toml | Black config for tests. |
| platform/mellanox/platform-utils/tests/mellanox_bfb_installer/conftest.py | Skips bfb-installer tests on nvidia-bluefield configured platform. |
| platform/mellanox/platform-utils/tests/mellanox_bfb_installer/test_rshim_daemon.py | Unit tests for rshim daemon helpers. |
| platform/mellanox/platform-utils/tests/mellanox_bfb_installer/test_reset_dpu.py | Unit tests for DPU reset/transition-wait logic. |
| platform/mellanox/platform-utils/tests/mellanox_bfb_installer/test_platform_dpu.py | Unit tests for platform validation + PCI detection + CX7 removal. |
| platform/mellanox/platform-utils/tests/mellanox_bfb_installer/test_main.py | Unit tests for logging/lock/usage/config augmentation behaviors. |
| platform/mellanox/platform-utils/tests/mellanox_bfb_installer/test_install_executor.py | Unit tests for parallel executor + signal handler behavior. |
| platform/mellanox/platform-utils/tests/mellanox_bfb_installer/test_device_selection.py | Unit tests for selection parsing and config validation. |
| platform/mellanox/platform-utils/tests/mellanox_bfb_installer/test_bfb_install_core.py | Unit tests for single-device install core flow and command building. |
| platform/mellanox/platform-utils/tests/mellanox_bfb_installer/test_bfb_file.py | Unit tests for download/extract/sha validation behavior. |
| platform/mellanox/mlnx-platform-api/sonic_platform/dpuctlplat.py | Adds use_logger() hook to replace SysLogger with standard logging.Logger. |
| """For each newly encountered config file, create a temp copy with original contents plus 'line 1' and 'line 2'; update targets to use the new paths.""" | ||
| processed_configs: Dict[str, str] = {} |
There was a problem hiding this comment.
The docstring mentions adding literal "line 1"/"line 2", but the function actually appends the dynamically generated config lines (e.g., NPU_TIME). Also, processed_configs is typed as Dict[str, str] but config_path can be None, so the key type should reflect Optional[str] (or the code should normalize None to a string key).
| """For each newly encountered config file, create a temp copy with original contents plus 'line 1' and 'line 2'; update targets to use the new paths.""" | |
| processed_configs: Dict[str, str] = {} | |
| """For each newly encountered config file, create a temp copy with original contents plus the generated additional config lines; update targets to use the new paths.""" | |
| processed_configs: Dict[Optional[str], str] = {} |
|
|
||
|
|
||
| def _user_dpu_selection_to_dpus_from_platform_json( | ||
| dpus: str, rshims: str, script_name: str, print_usage_callback: Callable[[], None] |
There was a problem hiding this comment.
Type hints for _user_dpu_selection_to_dpus_from_platform_json declare dpus/rshims as str, but the function explicitly handles None and is called with Optional[str]. Updating the signature to accept Optional[str] avoids misleading API contracts and improves readability.
| dpus: str, rshims: str, script_name: str, print_usage_callback: Callable[[], None] | |
| dpus: Optional[str], rshims: Optional[str], script_name: str, print_usage_callback: Callable[[], None] |
|
|
||
| def _parse_config_paths( | ||
| configs: Optional[str], num_dpus: int, user_selected_all_dpus: bool | ||
| ) -> List[str]: |
There was a problem hiding this comment.
_parse_config_paths is annotated to return List[str], but it returns [None] * num_dpus when configs is None. This should be List[Optional[str]] (and the callers/TargetInfo construction should match) to keep types accurate.
| ) -> List[str]: | |
| ) -> List[Optional[str]]: |
| lspci_out = _run_lspci_d_n() | ||
| detected_bus_ids: Dict[str, Dict[DpuInterfaceEnum, str]] = {} | ||
| for line in lspci_out.splitlines(): |
There was a problem hiding this comment.
detected_bus_ids is declared as Dict[str, Dict[DpuInterfaceEnum, str]], but the keys written are strings (DpuInterfaceEnum.*.value). This mismatch can confuse future readers and type checkers; align the annotation with the actual key type.
| import unittest | ||
| from unittest import mock | ||
|
|
||
| sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..")) |
There was a problem hiding this comment.
This test prepends .../tests to sys.path (via ..), which is not the package root (platform-utils). That can lead to confusing import resolution (e.g., implicit namespace package mellanox_bfb_installer coming from the tests directory). Consider removing this sys.path tweak and relying on pytest.ini's pythonpath = ., or adjust it to point to the actual package root (e.g., ../..).
| sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..")) | |
| sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "../.."))) |
|
|
||
| import pytest | ||
|
|
||
| sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..")) |
There was a problem hiding this comment.
This test prepends .../tests to sys.path (via ..), which is not the package root (platform-utils). That can lead to confusing import resolution (e.g., implicit namespace package mellanox_bfb_installer coming from the tests directory). Consider removing this sys.path tweak and relying on pytest.ini's pythonpath = ., or adjust it to point to the actual package root (e.g., ../..).
| sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..")) | |
| sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..", ".."))) |
| import unittest | ||
| from unittest import mock | ||
|
|
||
| sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..")) |
There was a problem hiding this comment.
This test prepends .../tests to sys.path (via ..), which is not the package root (platform-utils). That can lead to confusing import resolution (e.g., implicit namespace package mellanox_bfb_installer coming from the tests directory). Consider removing this sys.path tweak and relying on pytest.ini's pythonpath = ., or adjust it to point to the actual package root (e.g., ../..).
| sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..")) | |
| PACKAGE_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), "../..")) | |
| if PACKAGE_ROOT not in sys.path: | |
| sys.path.insert(0, PACKAGE_ROOT) |
| import unittest | ||
| from unittest import mock | ||
|
|
||
| sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..")) |
There was a problem hiding this comment.
This test prepends .../tests to sys.path (via ..), which is not the package root (platform-utils). That can lead to confusing import resolution (e.g., implicit namespace package mellanox_bfb_installer coming from the tests directory). Consider removing this sys.path tweak and relying on pytest.ini's pythonpath = ., or adjust it to point to the actual package root (e.g., ../..).
| sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..")) | |
| sys.path.insert(0, os.path.join(os.path.dirname(__file__), "../..")) |
| except FileNotFoundError: | ||
| logger.error("the curl command failed with: 1") |
There was a problem hiding this comment.
On FileNotFoundError (curl not installed), the code logs a hard-coded "failed with: 1" which looks like an exit code rather than the actual error. Logging that curl is missing (and/or including the exception details) would produce a more accurate and actionable error message.
| except FileNotFoundError: | |
| logger.error("the curl command failed with: 1") | |
| except FileNotFoundError as err: | |
| logger.error("curl command not found: %s", err) |
| def wait_for_rshim_boot(rshim: str) -> bool: | ||
| """Poll /dev/{rshim}/boot for up to 10 seconds. | ||
|
|
||
| Returns True if boot file appeared, False otherwise. | ||
| """ | ||
| boot_path = f"/dev/{rshim}/boot" | ||
| timeout = BOOT_WAIT_TIMEOUT_SEC | ||
| while timeout > 0: | ||
| if os.path.exists(boot_path): | ||
| return True | ||
| time.sleep(1) | ||
| timeout -= 1 | ||
| logger.error("%s: Error: Boot file did not appear after 10 seconds", rshim) | ||
| return False |
There was a problem hiding this comment.
The docstring says "up to 10 seconds" and the error message hard-codes 10 seconds, but the timeout is controlled by BOOT_WAIT_TIMEOUT_SEC. Using the constant in the docstring/log message keeps them correct if the timeout is ever changed.
Depends on the following PRs which must be merged and submodules synced:
(Mellanox builds will fail in python unit tests without these changes.)
Convert the Mellanox platform's sonic-bfb-installer.sh script (used in SmartSwitch) to a python implementation, which is now run with the sonic-bfb-installer command (no ".sh" suffix).
Why I did it
This change was made for better integration with the platform python libraries, and other advantages that python brings.
Work item tracking
How I did it
Converted all of the functionality to python, with the help of AI (cursor). Functionality should be extremely similar, with some slight but small improvements (e.g. better error messages).
How to verify it
Manually test the functionality.
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
master
Description for the changelog
[Mellanox] On SmartSwitch, the
sonic-bfb-installer.shcommand is deprecated and replaced with the newsonic-bfb-installercommand (no.shsuffix).Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)