Implement rescale change#142
Open
ashish47108 wants to merge 4 commits into
Open
Conversation
…exec.resource.default-parallelism and kafka partition count values
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix the issue with the Flink Table API source-rescale path so that:
Motivation
The motivating gap with the prior implementation was that it only consulted scan.parallelism and table.exec.resource.default-parallelism. If a user neither set those nor a useful job parallelism but did have a known partition count, there was no way to derive a sensible source parallelism — now Kafka partition count is the final fallback in the chain.
Precedence: scan.parallelism > table.exec.resource.default-parallelism > kafka partition count
Changes
PscTableCommonUtils.java
PscDynamicTableFactory.java / UpsertPscDynamicTableFactory.java (same pattern in both)
PscDynamicSource.java (in produceDataStream)
PscTableCommonUtilsTest.java
Removed all shouldApplyRescale-based tests (the method is gone).
Added 7 tests targeting getEffectiveSourceParallelism() (table below).
Usage examples
Example 1 — User sets scan.parallelism
Resolution: scan.parallelism = 64 is used. Source is pinned to min(64, env.getParallelism()); data is then rescale()d downstream.
Example 2 — User leaves scan.parallelism unset; table env has default parallelism
Example 3 — No parallelism configured anywhere, but topic has 24 partitions
Resolution: PSC metadata client returns 24; effective parallelism = 24. Source pinned to min(24, env.getParallelism()).
Example 4 — Metadata fetch fails / topic unreachable
Resolution: util returns -1. Factory still propagates shouldRescale = true and effectiveParallelism = -1. Inside PscDynamicSource, the warning branch fires and the source runs at env.getParallelism() (no explicit pin), so the job still starts.
Example 5 — Rescale disabled
'scan.enable-rescale' = 'false'
The factory skips getEffectiveSourceParallelism() entirely and passes -1 down. PscDynamicSource logs the "rescale disabled" branch and uses job default parallelism. (No behavioral regression from prior version.)
Executed all Unit test case related to this change using following commands