fix: improve Kafka topic partitioning and consumer resilience config#142
Open
fix: improve Kafka topic partitioning and consumer resilience config#142
Conversation
Increase topic-notification-event, topic-transfer-position, and topic-transfer-position-batch to 4 partitions to support consumer scaling without partition starvation. Add rdkafka consumer tuning (session.timeout.ms, heartbeat.interval.ms, max.poll.interval.ms, cooperative-sticky assignment strategy) to ml-api-adapter and ml-handler-notification config-modifiers to prevent rebalance storms and session timeouts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
topic-notification-event,topic-transfer-position, andtopic-transfer-position-batchnow provisioned with 4 partitions (up from 1) to support consumer scaling without partition starvation.session.timeout.ms=30000,heartbeat.interval.ms=10000,max.poll.interval.ms=300000, andpartition.assignment.strategy=cooperative-stickyto all notification consumer configs (ml-api-adapter.js,ml-handler-notification.js,ml-handler-notification-kafka.js).Context
Investigation of Kafka consumer health failures (
isAssigned=falsecausing 502 health checks) revealed that:rdkafkaConffor notification consumers had zero tuning beyond bare minimum (client.id,group.id,metadata.broker.list)rangepartition assignment strategy causes stop-the-world rebalancingThese changes are the test harness counterpart to mojaloop/ml-api-adapter#655 which adds the same tuning to the service defaults plus a health check grace period.
Changes
docker/kafka/scripts/provision.shdocker/config-modifier/configs/ml-api-adapter.jsdocker/config-modifier/configs/ml-handler-notification.jsdocker/config-modifier/configs/ml-handler-notification-kafka.jsTest plan
🤖 Generated with Claude Code