ClusterViewSet static queryset annotations cause extreme API slowness

### NetBox Edition

NetBox Community

### NetBox Version

v4.5.4 (Docker image: `netboxcommunity/netbox:v4.5-4.0.1`)

### Python Version

3.12

### Steps to Reproduce

1. Have a moderate number of clusters (~84) and a significant number of virtual machines (~15,500).
2. Query the clusters API endpoint:

```
GET /api/virtualization/clusters/
```

3. Even `?brief=True` exhibits the same behavior:

```
GET /api/virtualization/clusters/?brief=True
```
### Expected Behavior

The API request should complete in under 1–2 seconds for 84 clusters.

### Observed Behavior

The request takes **85–120 seconds** to complete. The bottleneck is the DRF pagination `COUNT(*)` query, which inherits the `Sum()` annotations and `LEFT OUTER JOIN` defined statically on the `ClusterViewSet.queryset`.

## Root Cause

`ClusterViewSet` in `virtualization/api/views.py` defines:

```python
class ClusterViewSet(NetBoxModelViewSet):
    queryset = Cluster.objects.prefetch_related('virtual_machines').annotate(
        allocated_vcpus=Sum('virtual_machines__vcpus'),
        allocated_memory=Sum('virtual_machines__memory'),
        allocated_disk=Sum('virtual_machines__disk'),
    )
```

These annotations and `prefetch_related` are applied **statically** to the queryset. This means they are present on **every** request, including:

- `?brief=True` requests (where these fields are never serialized)
- The pagination `COUNT(*)` query (where annotations are completely irrelevant)

However, `BaseViewSet.get_queryset()` already has a **dynamic mechanism** for resolving annotations and prefetches based on the serializer fields:

```python
class BaseViewSet(GenericViewSet):
    def get_queryset(self):
        qs = super().get_queryset()
        serializer_class = self.get_serializer_class()

        if prefetch := get_prefetches_for_serializer(serializer_class, **self.field_kwargs):
            qs = qs.prefetch_related(*prefetch)

        if annotations := get_annotations_for_serializer(serializer_class, **self.field_kwargs):
            qs = qs.annotate(**annotations)

        return qs
```

The `RelatedObjectCountField` fields (`virtualmachine_count`, `device_count`) are already handled dynamically by this mechanism. The static `Sum()` annotations on `ClusterViewSet` bypass this entirely, causing them to be applied unconditionally — including to the pagination `COUNT(*)` query.

## Pathological SQL generated by pagination

When DRF's paginator calls `queryset.count()`, Django cannot strip the annotations, producing:

```sql
SELECT COUNT(*) FROM (
  SELECT COALESCE(
    (SELECT COUNT(*) AS "c"
     FROM "virtualization_virtualmachine" U0
     WHERE U0."cluster_id" = ("virtualization_cluster"."id")
     GROUP BY U0."cluster_id"), 0
  ) AS "virtualmachine_count"
  FROM "virtualization_cluster"
  LEFT OUTER JOIN "virtualization_virtualmachine"
    ON ("virtualization_cluster"."id" = "virtualization_virtualmachine"."cluster_id")
  GROUP BY "virtualization_cluster"."id", 1
) subquery
```

`EXPLAIN ANALYZE` output:

```
Aggregate  (cost=1983993.25..1983993.26 rows=1 width=8)
           (actual time=61184.289..61184.299 rows=1 loops=1)
  ->  Group  (cost=23488.67..1983992.20 rows=84 width=16)
             (actual time=261.027..61184.098 rows=84 loops=1)
        Group Key: virtualization_cluster.id, (COALESCE((SubPlan 1), '0'::bigint))
        ->  Incremental Sort  (cost=23488.67..1973207.37 rows=15464 width=16)
                               (actual time=260.998..61182.784 rows=15468 loops=1)
              SubPlan 1
                ->  GroupAggregate  (cost=0.29..127.47 rows=1 width=16)
                                    (actual time=3.921..3.921 rows=1 loops=15468)
                      ->  Index Only Scan [...]
                            (actual time=0.022..3.266 rows=11934 loops=15468)
                            Heap Fetches: 88071177
Execution Time: 61199.586 ms
```

The correlated subquery executes **15,468 times** (once per joined VM row), each scanning ~11,934 rows. Total: **~88 million heap fetches** for what should be a trivial count of 84 rows.

## Diagnostic benchmarks

| Test | Time |
|------|------|
| `Cluster.objects.count()` — direct ORM | **0.036s** |
| `Cluster.objects.values_list('id', flat=True)` — direct ORM | **0.021s** |
| `ClusterSerializer(data, many=True).data` — serializer only | **11.3s** |
| `GET /api/virtualization/clusters/?brief=True` — full HTTP | **85.7s** |

## Possible Fix 

Remove the static annotations and `prefetch_related` from `ClusterViewSet.queryset`. The `BaseViewSet.get_queryset()` dynamic mechanism already handles `RelatedObjectCountField` annotations (`virtualmachine_count`, `device_count`) correctly via `get_annotations_for_serializer()`.

```python
class ClusterViewSet(NetBoxModelViewSet):
    queryset = Cluster.objects.all()
    serializer_class = serializers.ClusterSerializer
    filterset_class = filtersets.ClusterFilterSet
```

For the `allocated_vcpus`, `allocated_memory`, and `allocated_disk` fields, a proper solution would be to implement a dynamic field type (similar to `RelatedObjectCountField`) that integrates with `get_annotations_for_serializer()`, so these annotations are only applied when the fields are included in the response and never leak into the pagination `COUNT(*)` query.

## Environment

- 84 clusters, ~15,500 virtual machines
- PostgreSQL 17 (Patroni cluster)
- Upgraded from NetBox 3.7 → 4.5 (issue did not exist in 3.7, as `allocated_*` annotations were not present on `ClusterViewSet`)

## Related Issues

- #15131 — Annotate related objects counts on a REST API queryset only when needed
- #16085 — Object count fields documented in Swagger but not returned in response
- #18805 — Fix for annotation affecting queryset ordering



Test	Time
`Cluster.objects.count()` — direct ORM	0.036s
`Cluster.objects.values_list('id', flat=True)` — direct ORM	0.021s
`ClusterSerializer(data, many=True).data` — serializer only	11.3s
`GET /api/virtualization/clusters/?brief=True` — full HTTP	85.7s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClusterViewSet static queryset annotations cause extreme API slowness #21732

NetBox Edition

NetBox Version

Python Version

Steps to Reproduce

Expected Behavior

Observed Behavior

Root Cause

Pathological SQL generated by pagination

Diagnostic benchmarks

Possible Fix

Environment

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ClusterViewSet static queryset annotations cause extreme API slowness #21732

Description

NetBox Edition

NetBox Version

Python Version

Steps to Reproduce

Expected Behavior

Observed Behavior

Root Cause

Pathological SQL generated by pagination

Diagnostic benchmarks

Possible Fix

Environment

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions