Skip to content

Create an inventory startup time metric#123

Draft
michaelasp wants to merge 1 commit intokubernetes-sigs:mainfrom
michaelasp:startupMetric
Draft

Create an inventory startup time metric#123
michaelasp wants to merge 1 commit intokubernetes-sigs:mainfrom
michaelasp:startupMetric

Conversation

@michaelasp
Copy link
Copy Markdown
Contributor

Add a startup time metric for the device driver.

Traces to overall latency of startup, everything from creating the plugin path to discovering interfaces the first time around.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 31, 2026
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 31, 2026
@michaelasp
Copy link
Copy Markdown
Contributor Author

Creating from discussion here, it would be good to have a gauge metric to track the latency of startup

#118 (comment)

@aojea
Copy link
Copy Markdown
Contributor

aojea commented Mar 31, 2026

/assign @gauravkghildiyal

@michaelasp michaelasp force-pushed the startupMetric branch 3 times, most recently from 81b4aa8 to ca584c7 Compare March 31, 2026 20:28
Copy link
Copy Markdown
Member

@fmuyassarov fmuyassarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fmuyassarov, michaelasp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fmuyassarov
Copy link
Copy Markdown
Member

LGTM from my side but I’d like to give @gauravkghildiyal some time to review it as well.
/hold
/lgtm

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 31, 2026
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 31, 2026
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 31, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

New changes are detected. LGTM label has been removed.

@michaelasp michaelasp changed the title Create a startup time metric Create an inventory startup time metric Mar 31, 2026
@michaelasp
Copy link
Copy Markdown
Contributor Author

When I tested this locally, realized that inventory is run in a goroutine, so the overall latency metric is only ~1s but actually discovering devices may take longer. Moved the metric and updated the metric naming.

@michaelasp michaelasp force-pushed the startupMetric branch 3 times, most recently from 3debee8 to 8d3d790 Compare March 31, 2026 21:08
@michaelasp michaelasp marked this pull request as draft March 31, 2026 21:14
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 31, 2026
@michaelasp
Copy link
Copy Markdown
Contributor Author

Going to need to take a bit more time fleshing this out, the real portion we want to track is the inventory which is in pkg/inventory which is separate from pkg/driver and currently has no metrics.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 31, 2026
@michaelasp michaelasp marked this pull request as ready for review March 31, 2026 21:35
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 31, 2026
inventoryStartupDurationSeconds = prometheus.NewGauge(prometheus.GaugeOpts{
Namespace: "dranet",
Subsystem: "inventory",
Name: "inventory_startup_duration_seconds",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final metric name for this would end up being dranet_inventory_inventory_startup_duration_seconds. Should we strip the name to startup_duration_seconds?

Also, inventory startup name doesn't seem to be doing justice to what its measuring, given it excludes the initial scan completely. If our intent is to measure the cloud-provider detection latency, should we change the name to focus on the cloud provider? (And if we do that, then maybe we can wrap the metric within getInstanceProperties?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, it's more about the cloud provider latency here. Let me update this after the refactor being done with cloud provider hints #118

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@michaelasp michaelasp marked this pull request as draft April 10, 2026 16:13
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants