HIVE-29457: HiveSortExchangePullUpConstantsRule doesn't remove consta…#6316
HIVE-29457: HiveSortExchangePullUpConstantsRule doesn't remove consta…#6316soumyakanti3578 wants to merge 2 commits intoapache:masterfrom
Conversation
…nt column from distribution keys
| keys.stream() | ||
| .map(tmp::get) | ||
| .filter(Objects::nonNull) | ||
| .forEach(newKeys::add); |
There was a problem hiding this comment.
This change goes against the API specification of org.apache.calcite.rel.RelDistribution#apply:
* <p>If mapping eliminates one of the distribution keys, the {@link Type#ANY}
* distribution will be returned.
At this level it is undefined what a null target means so the transformation may not always be valid. https://issues.apache.org/jira/browse/CALCITE-3969 may contain additional insights.
It's probably safer to apply a change inside the HiveSortPullUpConstantsRule similar to what is done for collation.
There was a problem hiding this comment.
Makes sense. I have fixed this in the latest commit.
| SELECT col1 FROM test | ||
| WHERE col2 = 'a' | ||
| DISTRIBUTE BY col1, col2 | ||
| SORT BY col1, col2; |
There was a problem hiding this comment.
Can we drop the SORT BY to minimize the repro?
SELECT col1, col2 FROM test
WHERE col2 = 'a'
DISTRIBUTE BY col1, col2There was a problem hiding this comment.
Unfortunately this fails with:
EXPLAIN CBO
SELECT col1 FROM test
WHERE col2 = 'a'
DISTRIBUTE BY col1, col2
fname=distribution_key_constant_value.q
See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ for specific test cases logs.
org.apache.hadoop.hive.ql.parse.SemanticException: Line 6:20 Invalid table alias or column reference 'col2': (possible column names are: col1)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5224)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5154)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner$OrderByRelBuilder.getOrderByExpression(CalcitePlanner.java:5475)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner$OrderByRelBuilder.genSortByKey(CalcitePlanner.java:5441)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner$OrderByRelBuilder.addRelDistribution(CalcitePlanner.java:5507)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSBLogicalPlan(CalcitePlanner.java:3945)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4975)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1611)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1553)
at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:140)
at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:936)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:191)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:135)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1331)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:588)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13222)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:481)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:358)
at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:187)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:358)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
I think this is a bug and should be resolved in another ticket.
|



…nt column from distribution keys
What changes were proposed in this pull request? & Why are the changes needed?
Explained in https://issues.apache.org/jira/browse/HIVE-29457
Does this PR introduce any user-facing change?
No
How was this patch tested?