Joe/cluster module breakup#30151
Draft
joe-redpanda wants to merge 8 commits intoredpanda-data:devfrom
Draft
Conversation
…gets Break three sets of files out of the monolithic cluster library into their own Bazel targets, as a first step toward reducing the module's compilation blast radius. - cluster:errc — errc.h + errors.cc, zero intra-cluster deps - cluster:logger — logger.h + logger.cc, depends only on base + seastar - cluster:leader_balancer_strategy — the pure-algorithm layer of the leader balancer (types, strategy interface, constraints, probe, random), with zero dependencies on cluster internals Headers are re-exported from the main cluster target so downstream consumers are unaffected. Fix bare includes (e.g. "errc.h" → "cluster/errc.h") that broke once the headers moved to separate targets.
Extract three self-contained type groups from the 3,931-line types.h into their own headers, reducing the monolithic header by ~500 lines: - security_types.h: ACL cmd data, ACL request/reply, user_and_credential, role cmd data types - plugin_rpc_types.h: upsert/remove plugin request/response types - cluster_link_rpc_types.h: all cluster link and mirror topic request/response types types.h re-exports these headers so existing consumers are unaffected. This is preparation for giving each domain its own Bazel target, which will let downstream code depend on only the types it actually uses.
Break 23 sets of files out of the monolithic cluster library into their own Bazel targets, culminating in the extraction of the 3,400-line types.h — the single biggest dependency bottleneck in the module. New targets with zero cluster-internal deps (13 header-only, 6 with .cc): fwd, tx_errc, cluster_link_errc, simple_batch_builder, namespaced_cache, tx_hash_ranges, ntp_callbacks, shard_table, node_status_rpc_types, node_status_table, offsets_snapshot, security_types, plugin_rpc_types, partition_balancer_types, cluster_link_rpc_types, client_quota_serde, self_test_rpc_types, tm_stm_types, cloud_metadata_types, cloud_metadata_key_utils, cloud_metadata_cluster_manifest, data_migration_types Keystone extraction — types target: types.h/cc depends only on the targets above plus already-extracted targets (errc, features, version, nt_revision, topic_configuration). This removes 10 .cc files and 24 .h files from the monolith (148 → 138 .cc files). Fix bare includes (e.g. "types.h" → "cluster/types.h") that broke once the headers moved to separate targets. All headers are re-exported from the main cluster target so downstream consumers are unaffected.
Extract the next wave of targets from the monolithic cluster library, following the dependency chain unlocked by the types.h extraction. Header-only extractions (12 targets): commands, bootstrap_types, cluster_recovery_state, data_migration_group_proxy, cluster_uuid, partition_probe, drain_manager (header only; .cc stays in monolith), distributed_kv_stm (+types), rm_group_proxy, topic_table_probe Full .h+.cc extractions (10 targets): node_types, rm_stm_types, tx_protocol_types, plugin_table, data_migrated_resources, notification_latch, cluster_link_types, controller_snapshot, controller_log_limiter, health_monitor_types, members_table, feature_backend The commands.h extraction was the key unlock: it enabled topic_table_probe.h and members_table.h extraction, and together with cluster_recovery_state.h enabled controller_snapshot.h which unblocked members_table.cc and feature_backend.cc. This removes 12 .cc files from the monolith (138 → 126) and 23 .h files. Fix bare includes and re-export all headers from the main cluster target for backward compatibility.
Extract the next wave of targets plus clean up header duplication from incremental breakouts. New targets with .cc files (9): cluster_recovery_table, client_quota_store, producer_state, partition_properties_stm, log_eviction_stm, ephemeral_credential_frontend, producer_state_manager, client_quota_backend, ephemeral_credential_service New header-only targets (2): prefix_truncate_record, ephemeral_credential_serde Deduplication fixes: Remove headers from monolith hdrs that already exist in extracted targets (leader_balancer_strategy, topic_configuration, tx_manager_migrator_rpc). Add ephemeral_credential_serde as explicit dep of the ephemeral_credential_rpc generated target since the generated header includes it. Fix bare includes exposed by the deduplication (topic_configuration.h, topic_properties.h, snapshot.h, client_quota_store.h, producer_state.h, producer_state_manager.h). This removes 9 .cc files from the monolith (126 → 117).
Break the monolithic cluster_utils.h/cc into three purpose-specific files to decouple the "toxic" controller_stm.h dependency from the clean utility functions: - rpc_utils.h/cc: RPC client helpers (with_client, make_self_broker, etc.). Zero cluster deps — extracted as its own Bazel target. - controller_utils.h/cc: Functions that depend on controller_stm, partition, or topic_table (replicate_and_wait, get_partition_state, log_revision_on_node, persistent state copy/remove). Stays in the monolith. - cluster_utils.h/cc: Generic replica set utilities (subtract, contains_node, moving_from/to_node, etc.), make_error_topic_results, map_update_interruption_error_code, check_result_configuration. All deps are already extracted — extracted as its own Bazel target. This unblocks extraction of the scheduling/allocation subsystem: partition_allocator.cc only used subtract() from cluster_utils, which no longer pulls in controller_stm.h. Update 29 consumers to include the appropriate header(s). Remove 7 unused cluster_utils.h includes. Add explicit includes for types that were previously available transitively through the old cluster_utils.h → controller_stm.h → everything chain.
Extract 18 .cc files from the monolith across 9 new targets, enabled
by the cluster_utils split in the previous commit.
New targets:
scheduling_allocation (6 .cc) — the full allocation subsystem: types,
allocation_node, allocation_state, allocation_strategy, constraints,
partition_allocator. Unblocked because partition_allocator only needed
subtract() from the now-clean cluster_utils.
node_local_monitor (1 .cc) — standalone, no monolith deps
node_status_backend (1 .cc) — standalone
node_status_rpc_handler (1 .cc) — depends on node_status_backend
cluster_link_table (1 .cc) — standalone
self_test (6 .cc) — the complete self-test subsystem: cloudcheck,
diskcheck, netcheck, backend, frontend, rpc_handler. Unblocked by
extracting node_local_monitor first.
topic_validators (header-only) — trivial prerequisite for topic_table
topic_table (1 .cc) — core topic metadata, unblocked by
topic_validators extraction
partition_leaders_table (1 .cc) — unblocked by topic_table extraction
This reduces the monolith from 117 to 99 .cc files (below 100).
Extract another wave of .cc files newly unblocked by the topic_table and partition_leaders_table extractions in the previous commit. New targets (11): topic_table_partition_generator, topic_metrics_watcher, topic_recovery_validator, id_allocator_stm, cloud_metadata_manifest_downloads, inventory_service, tx_helpers, tm_stm, data_migration_table, plugin_backend, remote_topic_configuration_source These files all had their cluster deps fully satisfied by already-extracted targets. Fix bare includes in topic_metrics_watcher, topic_recovery_validator, and plugin_backend. This reduces the monolith from 98 to 87 .cc files.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backports Required
Release Notes