Skip to content

feat: add get_config support to device-discovery#267

Draft
leoparente wants to merge 2 commits intodevelopfrom
feat/OBS-1960-device-config
Draft

feat: add get_config support to device-discovery#267
leoparente wants to merge 2 commits intodevelopfrom
feat/OBS-1960-device-config

Conversation

@leoparente
Copy link
Contributor

This pull request introduces support for translating and ingesting device configuration data (such as startup, running, and candidate configs) from NAPALM into the Diode SDK, while maintaining compatibility with environments where the DeviceConfig protobuf message is not yet available. The changes are structured to gracefully handle the absence of this feature in the SDK and to ensure robust error handling during data collection.

Device configuration ingestion and translation:

  • Added logic in runner.py to collect device configuration data with error handling, ensuring that failures to retrieve config do not interrupt the data collection process.
  • Introduced a new DeviceConfig wrapper and related translation functions in translate.py to convert NAPALM config data into the Diode SDK DeviceConfig protobuf, including safe handling when the SDK does not yet support this message. [1] [2]
  • Updated the translate_device and translate_data functions to pass configuration data through the translation pipeline and attach it to the resulting device entity when supported. [1] [2] [3] [4]

Type hinting and imports:

  • Added missing imports and type hints to support new functionality and improve code clarity.

@leoparente leoparente self-assigned this Jan 30, 2026
@github-actions
Copy link

github-actions bot commented Jan 30, 2026

Coverage

Coverage Report
FileStmtsMissCoverMissing
device_discovery
   client.py55984%140–158
   discovery.py63297%142–145
   interface.py157398%140–144
   main.py49296%181, 187
   metrics.py53198%114
   server.py881089%44–46, 72–87, 184, 187
   translate.py1283077%27–31, 41–59, 138, 140, 165, 235–263
   version.py7186%14
device_discovery/policy
   manager.py61395%37–38, 161
   portscan.py841187%34–35, 59–60, 64, 72–76, 117
   run.py83298%150, 182
   runner.py1893681%211–212, 217–218, 245, 426–511
TOTAL112211090% 

Tests Skipped Failures Errors Time
178 0 💤 0 ❌ 0 🔥 7.838s ⏱️

Copy link
Member

@mfiedorowicz mfiedorowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argus Code Review Summary

  • 🟡 Medium: 6
  • 🔵 Low: 6

🔍 Automated review by Argus on behalf of @mfiedorowicz

chunk_num = 1
size_bytes = estimate_message_size(entities_list)

if size_bytes > (3.0 * 1024 * 1024): # 3MB threshold
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium

The 3MB threshold is hardcoded as a magic number. More importantly, estimate_message_size and create_message_chunks may use different internal thresholds for chunking, leading to inconsistency. Consider using a named constant and verifying alignment with the SDK's chunking logic.

Suggested change
if size_bytes > (3.0 * 1024 * 1024): # 3MB threshold
MAX_MESSAGE_SIZE_BYTES = 3 * 1024 * 1024 # 3MB threshold
if size_bytes > MAX_MESSAGE_SIZE_BYTES:

f"ERROR ingestion failed for {hostname} chunk {i}/{chunk_num}: "
f"{response.errors}"
)
return # Stop on first error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium

On chunk ingestion error, the method returns None silently. The caller has no way to know ingestion failed — partial data was ingested (some chunks succeeded) but no error is raised or returned. This is inconsistent with the non-chunked path which also silently logs. Consider raising an exception or returning a status so callers can handle failures.

logger.info(f"Hostname {hostname}: Successful ingestion")

# Convert to list for size estimation and chunking
entities_list = list(translated_entities)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low

Calling list(translated_entities) materializes the entire generator into memory. If translate_data returns a very large dataset (which is likely given the need for chunking at 3MB+), this doubles memory usage — once for the list, then again for the chunks. Consider whether the SDK's create_message_chunks could accept an iterator, or whether size estimation could be done differently.

@@ -119,15 +125,50 @@ def ingest(self, metadata: dict[str, Any] | None, data: dict):
with self._lock:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low

The entire ingest operation (including network calls to self.diode_client.ingest for potentially multiple chunks) is performed while holding self._lock. This means all other operations (including init_client and concurrent ingest calls) are blocked for the duration of potentially slow network I/O. Consider whether finer-grained locking is appropriate — e.g., only lock around shared state access, not the network calls.

running = running.encode("utf-8")

# Skip if no actual config data present
if not any([startup, running]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium

Using any() with a list literal is unnecessary and slightly less efficient. any([startup, running]) creates a list before evaluating; use any((startup, running)) or simply startup or running instead.

Suggested change
if not any([startup, running]):
if not (startup or running):

result = translate_device_config(config_info, options)

# Should return None since pb.DeviceConfig doesn't exist yet
if not _has_device_config:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium

Test test_translate_device_config_returns_none_when_sdk_unavailable only asserts when _has_device_config is False. If the SDK becomes available, this test silently passes without asserting anything, making it a no-op. It should assert the positive case too (result is not None when SDK is available).

Suggested change
if not _has_device_config:
if not _has_device_config:
assert result is None
else:
assert result is not None

if _has_device_config and result is not None:
# Would check that result contains bytes, not strings
# This will be testable once pb.DeviceConfig exists
assert True # Placeholder for future validation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium

Placeholder assert True makes this test a no-op when the SDK is available. The test claims to verify string-to-bytes conversion but never actually checks it. Either implement the actual assertion or skip the test with pytest.skip() to signal it's not yet functional.

Suggested change
assert True # Placeholder for future validation
if _has_device_config and result is not None:
# Verify configs were converted to bytes
assert isinstance(result.running, bytes)
assert isinstance(result.startup, bytes)
else:
pytest.skip("pb.DeviceConfig not yet available in SDK")


if _has_device_config and result is not None:
# Bytes should pass through without conversion
assert True # Placeholder for future validation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium

Same assert True placeholder issue as the string-to-bytes test. This test provides no actual validation when the SDK is available.

Suggested change
assert True # Placeholder for future validation
if _has_device_config and result is not None:
assert isinstance(result.running, bytes)
assert isinstance(result.startup, bytes)
else:
pytest.skip("pb.DeviceConfig not yet available in SDK")

assert result is None


def test_translate_device_config_with_none_config():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low

Test name says 'with_none_config' but the docstring says 'with None config', yet the test actually passes an empty dict {}, not None. This is identical to test_translate_device_config_with_empty_config. Consider actually passing None to test that code path.

Suggested change
def test_translate_device_config_with_none_config():
# Should handle None gracefully
result = translate_device_config(None, options)
assert result is None


entities = list(translate_data(data))

# Should have at least device entity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low

The assertion len(entities) > 0 is weak. Since you know the data has a device and no interfaces/IPs, you should assert the exact expected count (1) for a more precise test.

Suggested change
# Should have at least device entity
assert len(entities) == 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants