Update CF_TIP_BOOT_FAILED_COUNT Metric by AYUSHMAAN-B · Pull Request #17 · ittiam-systems/clusterfuzz

AYUSHMAAN-B · 2025-11-12T03:20:57Z

Add 'is_candidate' label to 'CF_TIP_BOOT_FAILED_COUNT' metric to distinguish prod vs candidate instances. Update 'get_instance_name()' to return the GCE VM ID for accurate instance tracking.

aditya-wazir · 2025-11-12T06:27:04Z

src/clusterfuzz/_internal/platforms/android/flash.py

  if adb.get_device_state() != 'device':
    if environment.is_android_cuttlefish():
      logs.info('Trying to boot cuttlefish instance using stable build.')
+      # Increment the boot failure count with the candidate field.


no need to add this comment for now

aditya-wazir · 2025-11-12T06:31:24Z

src/clusterfuzz/_internal/base/utils.py

  if environment.is_running_on_app_engine():
    return environment.get_value('GAE_INSTANCE', '')
+
+  # For Cuttlefish instances, get the GCE-assigned VM instance ID


no need to add this here , create a separate function for this. Don't want to modify already existing function. Changing logic here might have adverse effect over other components of cluster-fuzz , which were not supposed to affected by this change.

It seems like you are retrieving instance_id here, but the same is not being transmitted through the metric. Why are we doing this.

no need to add this here , create a separate function for this. Don't want to modify already existing function. Changing logic here might have adverse effect over other components of cluster-fuzz , which were not supposed to affected by this change.

Done. I have reverted changes in get_instance_name().

It seems like you are retrieving instance_id here, but the same is not being transmitted through the metric. Why are we doing this.

I have now created get_instance_ID() in monitor.py to get the instance ID directly which can be used in the metric.

aditya-wazir

Running: git diff --name-only FETCH_HEAD
| src/clusterfuzz/_internal/base/utils.py
| src/clusterfuzz/_internal/metrics/monitoring_metrics.py
| src/clusterfuzz/_internal/platforms/android/flash.py
| src/clusterfuzz/_internal/tests/core/metrics/monitor_test.py
Running: pylint --score=no --jobs=0 --ignore=protos,tests,grammars clusterfuzz
Running: pylint --score=no --jobs=0 --ignore=protos,grammars --max-line-length=240 --disable no-member clusterfuzz._internal.tests
| ************* Module clusterfuzz._internal.tests.core.metrics.monitor_test
| src/clusterfuzz/_internal/tests/core/metrics/monitor_test.py:123:11: E0602: Undefined variable 'is_candidate' (undefined-variable)
| src/clusterfuzz/_internal/tests/core/metrics/monitor_test.py:124:12: E0602: Undefined variable 'is_candidate' (undefined-variable)
| src/clusterfuzz/_internal/tests/core/metrics/monitor_test.py:129:63: E0602: Undefined variable 'is_candidate' (undefined-variable)
| Return code is non-zero (2).
Running: yapf -p -d src/clusterfuzz/_internal/base/utils.py src/clusterfuzz/_internal/metrics/monitoring_metrics.py src/clusterfuzz/_internal/platforms/android/flash.py src/clusterfuzz/_internal/tests/core/metrics/monitor_test.py
Running: isort --dont-order-by-type --force-single-line-imports --force-sort-within-sections --line-length=80 -p handlers -p libs -p clusterfuzz -c src/clusterfuzz/_internal/base/utils.py src/clusterfuzz/_internal/metrics/monitoring_metrics.py src/clusterfuzz/_internal/platforms/android/flash.py src/clusterfuzz/_internal/tests/core/metrics/monitor_test.py
Linting failed, see errors above.
Error: Process completed with exit code 1.

run/Basic Test is failing with lint error. Please ensure moving fwd that u do run unit test and pylint before further requesting for review. Thanks

Updated Cuttlefish boot metric unit test and added two new unit tests for Candidate fleet. Reverted changes in utils.py. Added get_instance_ID() in monitor.py to get the GCE VM ID directly.

aditya-wazir · 2025-11-13T05:52:46Z

src/clusterfuzz/_internal/metrics/monitor.py


+# For Cuttlefish instances, get the GCE-assigned VM instance ID
+# for accurate instance tracking.
+def get_instance_ID():


Function missing docstring

aditya-wazir · 2025-11-13T05:53:45Z

src/clusterfuzz/_internal/metrics/monitor.py

+    if instance_id:
+      return instance_id
+
+  return utils.get_instance_name()


return utils.get_instance_name() --> why this??

I have added at as a fail safe. I have now merged the get_instance_id() implementation in flash.py directly.

aditya-wazir · 2025-11-13T05:55:40Z

src/clusterfuzz/_internal/metrics/monitor.py

  _monitored_resource.labels['project_id'] = utils.get_application_id()

-  _monitored_resource.labels['instance_id'] = utils.get_instance_name()
+  _monitored_resource.labels['instance_id'] = get_instance_ID()


why are we doing this. this will have effect on rest of the metrics too. This should not be done here AFAIK

Done. Reverted the changes.

aditya-wazir · 2025-11-13T05:56:47Z

src/clusterfuzz/_internal/metrics/monitoring_metrics.py

+        # Add 'is_candidate' field to distinguish between prod and
+        # candidate instances.
+        monitor.BooleanField('is_candidate'),
        monitor.BooleanField('is_succeeded'),


monitor.StringField('instance_id') ---> lets define this. Check if String field is the most accurate data type

Done. Added 'instance_id field in the metric as suggested. Adding it as a StringField as it was being used as string everywhere in the repo.

aditya-wazir · 2025-11-13T06:05:52Z

src/clusterfuzz/_internal/platforms/android/flash.py

    if environment.is_android_cuttlefish():
      logs.info('Trying to boot cuttlefish instance using stable build.')
      monitoring_metrics.CF_TIP_BOOT_FAILED_COUNT.increment({
          'build_id': build_info['bid'],


add instance_id in CF_TIP_BOOT_FAILED_COUNT metric, that way we have consistent mapping among is_candidate, instance_id and is_succeeded key, which can the be used in PromQL query to get exact stats.

aditya-wazir · 2025-11-13T06:06:33Z

src/clusterfuzz/_internal/tests/core/metrics/monitor_test.py

+    self.mock.get_device_state.return_value = 'device'
+    flash.flash_to_latest_build_if_needed()
+    args = call_queue.get(timeout=20)
+    time_series = args['time_series']


what about get_instance_id(), that we have added, test for that api??

Creating a unit test for get_instance_id() required a new flash_test.py file. It was noticed that the file did not exist before and no functions from flash.py have unit tests. Since, the function is being used only once and flash_test.py file did not exist before, I merged the function's logic directly into the single point of use to avoid creating a new file.

Added 'instance_id' in the CF_TIP_BOOT_FAILED_COUNT metric. Reverted changes in monitor.py. Merged 'get_instance_id()' implementation directly in flash.py to avoid creating a new unit test file.

aditya-wazir · 2025-11-17T06:36:44Z

src/clusterfuzz/_internal/platforms/android/flash.py


+  # For Cuttlefish instances, get the GCE-assigned VM instance ID
+  # for accurate instance tracking.
+  instance_id = None


Can we move this to line 176. Here it will get invoked, even if device is not cuttlefish , and i don't want any regression or failure due to our code . change if this sounds good to you

Refactoring code related to getting GCE VM Instance ID

aditya-wazir · 2025-11-18T05:17:12Z

src/clusterfuzz/_internal/tests/core/metrics/monitor_test.py

+
  @patch(
      'clusterfuzz._internal.metrics.monitor.monitoring_v3.MetricServiceClient')
  def test_cuttlefish_boot_success_metric(self, mock_client):


_for_production_fleet should be added to test name of this and below test. Better to ensure naming is consistent across the tests added by us

src/clusterfuzz/_internal/platforms/android/flash.py

Update CF_TIP_BOOT_FAILED_COUNT Metric

963715f

AYUSHMAAN-B marked this pull request as draft November 12, 2025 03:23

AYUSHMAAN-B marked this pull request as ready for review November 12, 2025 03:25

AashutoshMurthy requested a review from aditya-wazir November 12, 2025 04:58

aditya-wazir reviewed Nov 12, 2025

View reviewed changes

Update Cuttlefish Boot Metric Unit Test

e3eeb08

Updated Cuttlefish boot metric unit test and added two new unit tests for Candidate fleet. Reverted changes in utils.py. Added get_instance_ID() in monitor.py to get the GCE VM ID directly.

AYUSHMAAN-B force-pushed the Update_Cuttlefish_Metric branch from f945014 to e3eeb08 Compare November 12, 2025 12:38

aditya-wazir reviewed Nov 13, 2025

View reviewed changes

Update CF_TIP_BOOT_FAILED_COUNT Metric

9635b6a

Added 'instance_id' in the CF_TIP_BOOT_FAILED_COUNT metric. Reverted changes in monitor.py. Merged 'get_instance_id()' implementation directly in flash.py to avoid creating a new unit test file.

aditya-wazir reviewed Nov 17, 2025

View reviewed changes

AYUSHMAAN-B added 2 commits November 17, 2025 14:47

Refactoring code related to instance_id

c22537e

Refactoring code related to getting GCE VM Instance ID

Refactored flash.py

27fe194

aditya-wazir reviewed Nov 18, 2025

View reviewed changes

Update function name in monitor_test.py

b68e9df

aditya-wazir approved these changes Nov 18, 2025

View reviewed changes

svasudevprasad reviewed Nov 18, 2025

View reviewed changes

src/clusterfuzz/_internal/platforms/android/flash.py Show resolved Hide resolved

svasudevprasad approved these changes Nov 19, 2025

View reviewed changes

Update Cuttlefish Metric Unit Test

997b2ac

AYUSHMAAN-B force-pushed the Update_Cuttlefish_Metric branch from c4218ee to 997b2ac Compare December 1, 2025 09:56

Remove comments

c78ec1a

Conversation

AYUSHMAAN-B commented Nov 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aditya-wazir left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants