Update CF_TIP_BOOT_FAILED_COUNT Metric#17
Update CF_TIP_BOOT_FAILED_COUNT Metric#17AYUSHMAAN-B wants to merge 8 commits intoittiam-systems:masterfrom
Conversation
| if adb.get_device_state() != 'device': | ||
| if environment.is_android_cuttlefish(): | ||
| logs.info('Trying to boot cuttlefish instance using stable build.') | ||
| # Increment the boot failure count with the candidate field. |
There was a problem hiding this comment.
no need to add this comment for now
| if environment.is_running_on_app_engine(): | ||
| return environment.get_value('GAE_INSTANCE', '') | ||
|
|
||
| # For Cuttlefish instances, get the GCE-assigned VM instance ID |
There was a problem hiding this comment.
no need to add this here , create a separate function for this. Don't want to modify already existing function. Changing logic here might have adverse effect over other components of cluster-fuzz , which were not supposed to affected by this change.
It seems like you are retrieving instance_id here, but the same is not being transmitted through the metric. Why are we doing this.
There was a problem hiding this comment.
no need to add this here , create a separate function for this. Don't want to modify already existing function. Changing logic here might have adverse effect over other components of cluster-fuzz , which were not supposed to affected by this change.
Done. I have reverted changes in get_instance_name().
It seems like you are retrieving instance_id here, but the same is not being transmitted through the metric. Why are we doing this.
I have now created get_instance_ID() in monitor.py to get the instance ID directly which can be used in the metric.
aditya-wazir
left a comment
There was a problem hiding this comment.
Running: git diff --name-only FETCH_HEAD
| src/clusterfuzz/_internal/base/utils.py
| src/clusterfuzz/_internal/metrics/monitoring_metrics.py
| src/clusterfuzz/_internal/platforms/android/flash.py
| src/clusterfuzz/_internal/tests/core/metrics/monitor_test.py
Running: pylint --score=no --jobs=0 --ignore=protos,tests,grammars clusterfuzz
Running: pylint --score=no --jobs=0 --ignore=protos,grammars --max-line-length=240 --disable no-member clusterfuzz._internal.tests
| ************* Module clusterfuzz._internal.tests.core.metrics.monitor_test
| src/clusterfuzz/_internal/tests/core/metrics/monitor_test.py:123:11: E0602: Undefined variable 'is_candidate' (undefined-variable)
| src/clusterfuzz/_internal/tests/core/metrics/monitor_test.py:124:12: E0602: Undefined variable 'is_candidate' (undefined-variable)
| src/clusterfuzz/_internal/tests/core/metrics/monitor_test.py:129:63: E0602: Undefined variable 'is_candidate' (undefined-variable)
| Return code is non-zero (2).
Running: yapf -p -d src/clusterfuzz/_internal/base/utils.py src/clusterfuzz/_internal/metrics/monitoring_metrics.py src/clusterfuzz/_internal/platforms/android/flash.py src/clusterfuzz/_internal/tests/core/metrics/monitor_test.py
Running: isort --dont-order-by-type --force-single-line-imports --force-sort-within-sections --line-length=80 -p handlers -p libs -p clusterfuzz -c src/clusterfuzz/_internal/base/utils.py src/clusterfuzz/_internal/metrics/monitoring_metrics.py src/clusterfuzz/_internal/platforms/android/flash.py src/clusterfuzz/_internal/tests/core/metrics/monitor_test.py
Linting failed, see errors above.
Error: Process completed with exit code 1.
run/Basic Test is failing with lint error. Please ensure moving fwd that u do run unit test and pylint before further requesting for review. Thanks
Updated Cuttlefish boot metric unit test and added two new unit tests for Candidate fleet. Reverted changes in utils.py. Added get_instance_ID() in monitor.py to get the GCE VM ID directly.
f945014 to
e3eeb08
Compare
|
|
||
| # For Cuttlefish instances, get the GCE-assigned VM instance ID | ||
| # for accurate instance tracking. | ||
| def get_instance_ID(): |
There was a problem hiding this comment.
Function missing docstring
| if instance_id: | ||
| return instance_id | ||
|
|
||
| return utils.get_instance_name() |
There was a problem hiding this comment.
return utils.get_instance_name() --> why this??
There was a problem hiding this comment.
I have added at as a fail safe. I have now merged the get_instance_id() implementation in flash.py directly.
| _monitored_resource.labels['project_id'] = utils.get_application_id() | ||
|
|
||
| _monitored_resource.labels['instance_id'] = utils.get_instance_name() | ||
| _monitored_resource.labels['instance_id'] = get_instance_ID() |
There was a problem hiding this comment.
why are we doing this. this will have effect on rest of the metrics too. This should not be done here AFAIK
There was a problem hiding this comment.
Done. Reverted the changes.
| # Add 'is_candidate' field to distinguish between prod and | ||
| # candidate instances. | ||
| monitor.BooleanField('is_candidate'), | ||
| monitor.BooleanField('is_succeeded'), |
There was a problem hiding this comment.
monitor.StringField('instance_id') ---> lets define this. Check if String field is the most accurate data type
There was a problem hiding this comment.
Done. Added 'instance_id field in the metric as suggested. Adding it as a StringField as it was being used as string everywhere in the repo.
| if environment.is_android_cuttlefish(): | ||
| logs.info('Trying to boot cuttlefish instance using stable build.') | ||
| monitoring_metrics.CF_TIP_BOOT_FAILED_COUNT.increment({ | ||
| 'build_id': build_info['bid'], |
There was a problem hiding this comment.
add instance_id in CF_TIP_BOOT_FAILED_COUNT metric, that way we have consistent mapping among is_candidate, instance_id and is_succeeded key, which can the be used in PromQL query to get exact stats.
| self.mock.get_device_state.return_value = 'device' | ||
| flash.flash_to_latest_build_if_needed() | ||
| args = call_queue.get(timeout=20) | ||
| time_series = args['time_series'] |
There was a problem hiding this comment.
what about get_instance_id(), that we have added, test for that api??
There was a problem hiding this comment.
Creating a unit test for get_instance_id() required a new flash_test.py file. It was noticed that the file did not exist before and no functions from flash.py have unit tests. Since, the function is being used only once and flash_test.py file did not exist before, I merged the function's logic directly into the single point of use to avoid creating a new file.
Added 'instance_id' in the CF_TIP_BOOT_FAILED_COUNT metric. Reverted changes in monitor.py. Merged 'get_instance_id()' implementation directly in flash.py to avoid creating a new unit test file.
|
|
||
| # For Cuttlefish instances, get the GCE-assigned VM instance ID | ||
| # for accurate instance tracking. | ||
| instance_id = None |
There was a problem hiding this comment.
Can we move this to line 176. Here it will get invoked, even if device is not cuttlefish , and i don't want any regression or failure due to our code . change if this sounds good to you
Refactoring code related to getting GCE VM Instance ID
|
|
||
| @patch( | ||
| 'clusterfuzz._internal.metrics.monitor.monitoring_v3.MetricServiceClient') | ||
| def test_cuttlefish_boot_success_metric(self, mock_client): |
There was a problem hiding this comment.
_for_production_fleet should be added to test name of this and below test. Better to ensure naming is consistent across the tests added by us
c4218ee to
997b2ac
Compare
Add 'is_candidate' label to 'CF_TIP_BOOT_FAILED_COUNT' metric to distinguish prod vs candidate instances. Update 'get_instance_name()' to return the GCE VM ID for accurate instance tracking.