Synchronize state between multiple replicas of Gateway Controller #503
Replies: 2 comments
-
|
Sequence Diagrams: %%{init: {'theme':'base', 'themeVariables': { 'actorTextColor':'#ff0000', 'noteBkgColor':'rgb(220,255,220)', 'noteTextColor':'#000000'}}}%%
sequenceDiagram
participant Client
participant APIServer
participant DeploymentService
participant EventHub
participant EventListener
participant PolicyManager
participant XDSManager
rect rgb(200,220,255)
Note over Client,XDSManager: Single-replica (enableReplicaSync=false)
Client->>APIServer: CreateAPI / UpdateAPI
APIServer->>DeploymentService: UpdateAPIConfiguration
DeploymentService->>DeploymentService: BuildStoredPolicyFromAPI
DeploymentService->>PolicyManager: updatePolicyConfiguration
DeploymentService->>XDSManager: triggerXDSSnapshotUpdate (sync)
XDSManager-->>DeploymentService: snapshot updated
DeploymentService-->>APIServer: result
APIServer-->>Client: 200 OK
end
rect rgb(220,255,220)
Note over Client,EventHub: Multi-replica (enableReplicaSync=true)
Client->>APIServer: CreateAPI / UpdateAPI
APIServer->>DeploymentService: UpdateAPIConfiguration
DeploymentService->>DeploymentService: BuildStoredPolicyFromAPI
DeploymentService->>EventHub: PublishEvent (API_CREATE/UPDATE/DELETE)
EventHub-->>DeploymentService: ack/persisted
DeploymentService-->>APIServer: result
APIServer-->>Client: 200 OK
par Async replication processing
EventHub->>EventListener: deliver event (via subscription)
EventListener->>EventListener: processAPIEvents -> handleAPICreateOrUpdate / handleAPIDelete
EventListener->>PolicyManager: apply/remove policy (if configured)
EventListener->>XDSManager: triggerXDSSnapshotUpdate (async, with timeout)
end
end
%%{init: {'theme':'base', 'themeVariables': { 'actorTextColor':'#ff0000', 'noteBkgColor':'rgb(220,255,220)', 'noteTextColor':'#000000'}}}%%
sequenceDiagram
participant API as API Handler
participant DS as APIDeploymentService
participant EH as EventHub
participant EL as EventListener
participant DB as Database
rect rgb(200, 220, 255)
note over API,EL: Multi-Replica Mode (enableReplicaSync=true)
API->>DS: UpdateAPIConfiguration(params)
DS->>DS: Parse & Validate Config
DS ->> DB: Save Configuration
DS->>EH: PublishEvent(API_CREATE/UPDATE)
EH->>EL: Event (buffered channel)
DS->>DS: Return Success
end
%%{init: {'theme':'base', 'themeVariables': { 'actorTextColor':'#ff0000', 'noteBkgColor':'rgb(220,255,220)', 'noteTextColor':'#000000'}}}%%
sequenceDiagram
participant DS as APIDeploymentService
participant EH as EventHub
participant EL as EventListener
participant PM as PolicyManager
participant XDS as XDS Manager
participant DB as Database
rect rgb(200, 220, 255)
note over EL: Async Event Processing
EL->>DS: handleAPICreateOrUpdate(event)
DS->>DB: Fetch API from DB
DS->>DS: Update in memory DB
DS->>PM: Build & Apply Policy
PM->>PM: Update PolicyConfig
DS->>XDS: Trigger Snapshot Update
XDS->>XDS: Update Configuration
end
|
Beta Was this translation helpful? Give feedback.
-
|
Update: Architecture Document. Design Proposal: Event-Based Multi-Replica Synchronization for Gateway ControllerSection 1: Problem StatementWhat is the problem?In production deployments, the Gateway Controller often runs as multiple replicas for high availability and load distribution. Currently, when an API configuration is created or updated via one replica, other replicas have no mechanism to learn about these changes. This leads to configuration drift where different replicas serve inconsistent xDS snapshots to Envoy proxies, causing unpredictable routing behavior and potential service disruptions. From the operator's perspective, deploying multiple Gateway Controller replicas should provide seamless high availability without requiring external synchronization tooling or manual intervention. Why should it be solved now?
Section 2: Solution2.1 Architecture OverviewThe solution introduces an EventHub abstraction with a pluggable backend model that enables publish-subscribe event delivery between Gateway Controller replicas. The default implementation uses SQLite with a polling mechanism, making it suitable for deployments sharing a common database. 3.2 Component DesignEventHub InterfaceThe EventHub provides a broker-agnostic interface for event publishing and subscription: This is to simulate the broker function. type EventHub interface {
Initialize(ctx context.Context) error
RegisterOrganization(organizationID string) error
PublishEvent(ctx context.Context, organizationID string, eventType EventType,
action, entityID, correlationID string, eventData []byte) error
Subscribe(organizationID string, eventChan chan<- []Event) error
CleanUpEvents(ctx context.Context, timeFrom, timeEnd time.Time) error
Close() error
}Event Typesconst (
EventTypeAPI EventType = "API"
EventTypeCertificate EventType = "CERTIFICATE"
EventTypeLLMTemplate EventType = "LLM_TEMPLATE"
)Backend AbstractionThe type EventhubImpl interface {
Initialize(ctx context.Context) error
RegisterOrganization(ctx context.Context, orgID string) error
Publish(ctx context.Context, orgID string, eventType EventType,
action, entityID, correlationID string, eventData []byte) error
Subscribe(orgID string, eventChan chan<- []Event) error
Unsubscribe(orgID string, eventChan chan<- []Event) error
Cleanup(ctx context.Context, olderThan time.Time) error
CleanupRange(ctx context.Context, from, to time.Time) error
Close() error
}Supported backend types (extensible):
EventSource Abstraction and EventHub AdapterTo decouple the EventListener from the specific EventHub implementation and enable testability, an EventSource interface is introduced: type EventSource interface {
Subscribe(ctx context.Context, organizationID string, eventChan chan<- []Event) error
Unsubscribe(organizationID string) error
Close() error
}The EventHubAdapter implements the EventSource interface and acts as a bridge between the EventHub (which uses type EventHubAdapter struct {
eventHub eventhub.EventHub
logger *zap.Logger
activeSubscriptions sync.Map // tracks active subscriptions
}Key responsibilities of the EventHub Adapter:
This adapter pattern enables:
3.3 ConfigurationA new configuration option server:
enableReplicaSync: true # default: falseWhen enabled:
3.4 Database Schema ChangesTwo new tables are added (schema version 6): -- Tracks version per organization for efficient change detection
CREATE TABLE organization_states (
organization TEXT PRIMARY KEY,
version_id TEXT NOT NULL DEFAULT '',
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
-- Stores all entity change events
CREATE TABLE events (
organization_id TEXT NOT NULL,
processed_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
originated_timestamp TIMESTAMP NOT NULL,
event_type TEXT NOT NULL,
action TEXT NOT NULL CHECK(action IN ('CREATE', 'UPDATE', 'DELETE')),
entity_id TEXT NOT NULL,
correlation_id TEXT NOT NULL,
event_data TEXT NOT NULL,
PRIMARY KEY (correlation_id)
);
CREATE INDEX idx_events_org_time ON events(organization_id, processed_timestamp);3.5 Alternatives Considered
Section 4: Challenges and Constraints4.1 Security Considerations
4.2 Performance Characteristics
4.3 Failure Handling
4.4 Backward Compatibility
4.5 Trade-offs
Appendix: Key Files
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
Gateway Controller needs to run multiple replicas for the sake of high availability and all those replicas should have the same XDS cache state populated.
Solution
Abstract
Introduce new tables for maintaining states and events for each category. The data won't be linked to any available data via foreign keys. When api deploy request is received, gateway controller updates the tables and process ends. There is a separate go routine running in each gateway controller which polls on the states table. And if what received from state table for APIs is different compared to what it already knows, then it would do a lookup on Events table based on timestamp.
Key Characteristics
DDL
Synchronization Flow
Write Path (Instance A modifies an API)
Read Path (Instance A/B detects and applies change)
Cleanup Path
Alternative Approaches
Beta Was this translation helpful? Give feedback.
All reactions