-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
NetBox Edition
NetBox Community
NetBox Version
v4.5.4 (Docker image: netboxcommunity/netbox:v4.5-4.0.1)
Python Version
3.12
Steps to Reproduce
- Have a moderate number of clusters (~84) and a significant number of virtual machines (~15,500).
- Query the clusters API endpoint:
GET /api/virtualization/clusters/
- Even
?brief=Trueexhibits the same behavior:
GET /api/virtualization/clusters/?brief=True
Expected Behavior
The API request should complete in under 1–2 seconds for 84 clusters.
Observed Behavior
The request takes 85–120 seconds to complete. The bottleneck is the DRF pagination COUNT(*) query, which inherits the Sum() annotations and LEFT OUTER JOIN defined statically on the ClusterViewSet.queryset.
Root Cause
ClusterViewSet in virtualization/api/views.py defines:
class ClusterViewSet(NetBoxModelViewSet):
queryset = Cluster.objects.prefetch_related('virtual_machines').annotate(
allocated_vcpus=Sum('virtual_machines__vcpus'),
allocated_memory=Sum('virtual_machines__memory'),
allocated_disk=Sum('virtual_machines__disk'),
)These annotations and prefetch_related are applied statically to the queryset. This means they are present on every request, including:
?brief=Truerequests (where these fields are never serialized)- The pagination
COUNT(*)query (where annotations are completely irrelevant)
However, BaseViewSet.get_queryset() already has a dynamic mechanism for resolving annotations and prefetches based on the serializer fields:
class BaseViewSet(GenericViewSet):
def get_queryset(self):
qs = super().get_queryset()
serializer_class = self.get_serializer_class()
if prefetch := get_prefetches_for_serializer(serializer_class, **self.field_kwargs):
qs = qs.prefetch_related(*prefetch)
if annotations := get_annotations_for_serializer(serializer_class, **self.field_kwargs):
qs = qs.annotate(**annotations)
return qsThe RelatedObjectCountField fields (virtualmachine_count, device_count) are already handled dynamically by this mechanism. The static Sum() annotations on ClusterViewSet bypass this entirely, causing them to be applied unconditionally — including to the pagination COUNT(*) query.
Pathological SQL generated by pagination
When DRF's paginator calls queryset.count(), Django cannot strip the annotations, producing:
SELECT COUNT(*) FROM (
SELECT COALESCE(
(SELECT COUNT(*) AS "c"
FROM "virtualization_virtualmachine" U0
WHERE U0."cluster_id" = ("virtualization_cluster"."id")
GROUP BY U0."cluster_id"), 0
) AS "virtualmachine_count"
FROM "virtualization_cluster"
LEFT OUTER JOIN "virtualization_virtualmachine"
ON ("virtualization_cluster"."id" = "virtualization_virtualmachine"."cluster_id")
GROUP BY "virtualization_cluster"."id", 1
) subqueryEXPLAIN ANALYZE output:
Aggregate (cost=1983993.25..1983993.26 rows=1 width=8)
(actual time=61184.289..61184.299 rows=1 loops=1)
-> Group (cost=23488.67..1983992.20 rows=84 width=16)
(actual time=261.027..61184.098 rows=84 loops=1)
Group Key: virtualization_cluster.id, (COALESCE((SubPlan 1), '0'::bigint))
-> Incremental Sort (cost=23488.67..1973207.37 rows=15464 width=16)
(actual time=260.998..61182.784 rows=15468 loops=1)
SubPlan 1
-> GroupAggregate (cost=0.29..127.47 rows=1 width=16)
(actual time=3.921..3.921 rows=1 loops=15468)
-> Index Only Scan [...]
(actual time=0.022..3.266 rows=11934 loops=15468)
Heap Fetches: 88071177
Execution Time: 61199.586 ms
The correlated subquery executes 15,468 times (once per joined VM row), each scanning ~11,934 rows. Total: ~88 million heap fetches for what should be a trivial count of 84 rows.
Diagnostic benchmarks
| Test | Time |
|---|---|
Cluster.objects.count() — direct ORM |
0.036s |
Cluster.objects.values_list('id', flat=True) — direct ORM |
0.021s |
ClusterSerializer(data, many=True).data — serializer only |
11.3s |
GET /api/virtualization/clusters/?brief=True — full HTTP |
85.7s |
Possible Fix
Remove the static annotations and prefetch_related from ClusterViewSet.queryset. The BaseViewSet.get_queryset() dynamic mechanism already handles RelatedObjectCountField annotations (virtualmachine_count, device_count) correctly via get_annotations_for_serializer().
class ClusterViewSet(NetBoxModelViewSet):
queryset = Cluster.objects.all()
serializer_class = serializers.ClusterSerializer
filterset_class = filtersets.ClusterFilterSetFor the allocated_vcpus, allocated_memory, and allocated_disk fields, a proper solution would be to implement a dynamic field type (similar to RelatedObjectCountField) that integrates with get_annotations_for_serializer(), so these annotations are only applied when the fields are included in the response and never leak into the pagination COUNT(*) query.
Environment
- 84 clusters, ~15,500 virtual machines
- PostgreSQL 17 (Patroni cluster)
- Upgraded from NetBox 3.7 → 4.5 (issue did not exist in 3.7, as
allocated_*annotations were not present onClusterViewSet)
Related Issues
- Annotate related objects counts on a REST API queryset only when the field is active on the serializer #15131 — Annotate related objects counts on a REST API queryset only when needed
- Object count fields are documented in Swagger for REST API GET requests but not included in the response #16085 — Object count fields documented in Swagger but not returned in response
- Fixes: #18729 - Reapply model-level ordering on list views (UI and API) to account for annotation #18805 — Fix for annotation affecting queryset ordering