Skip to content

HIVE-29457: HiveSortExchangePullUpConstantsRule doesn't remove consta…#6316

Open
soumyakanti3578 wants to merge 2 commits intoapache:masterfrom
soumyakanti3578:HIVE-29457
Open

HIVE-29457: HiveSortExchangePullUpConstantsRule doesn't remove consta…#6316
soumyakanti3578 wants to merge 2 commits intoapache:masterfrom
soumyakanti3578:HIVE-29457

Conversation

@soumyakanti3578
Copy link
Contributor

…nt column from distribution keys

What changes were proposed in this pull request? & Why are the changes needed?

Explained in https://issues.apache.org/jira/browse/HIVE-29457

Does this PR introduce any user-facing change?

No

How was this patch tested?

mvn test -pl itests/qtest -Pitests -Dtest=TestMiniLlapLocalCliDriver -Dtest.output.overwrite=true -o -Dqfile="distribution_key_constant_value.q"

Comment on lines 99 to 102
keys.stream()
.map(tmp::get)
.filter(Objects::nonNull)
.forEach(newKeys::add);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change goes against the API specification of org.apache.calcite.rel.RelDistribution#apply:

   * <p>If mapping eliminates one of the distribution keys, the {@link Type#ANY}
   * distribution will be returned.

At this level it is undefined what a null target means so the transformation may not always be valid. https://issues.apache.org/jira/browse/CALCITE-3969 may contain additional insights.

It's probably safer to apply a change inside the HiveSortPullUpConstantsRule similar to what is done for collation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I have fixed this in the latest commit.

Comment on lines +4 to +7
SELECT col1 FROM test
WHERE col2 = 'a'
DISTRIBUTE BY col1, col2
SORT BY col1, col2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop the SORT BY to minimize the repro?

SELECT col1, col2 FROM test
WHERE col2 = 'a'
DISTRIBUTE BY col1, col2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this fails with:

EXPLAIN CBO
SELECT col1 FROM test
WHERE col2 = 'a'
DISTRIBUTE BY col1, col2
fname=distribution_key_constant_value.q

See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ for specific test cases logs.
 org.apache.hadoop.hive.ql.parse.SemanticException: Line 6:20 Invalid table alias or column reference 'col2': (possible column names are: col1)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5224)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5154)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$OrderByRelBuilder.getOrderByExpression(CalcitePlanner.java:5475)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$OrderByRelBuilder.genSortByKey(CalcitePlanner.java:5441)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$OrderByRelBuilder.addRelDistribution(CalcitePlanner.java:5507)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSBLogicalPlan(CalcitePlanner.java:3945)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4975)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1611)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1553)
	at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:140)
	at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:936)
	at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:191)
	at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:135)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1331)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:588)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13222)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:481)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:358)
	at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:187)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:358)
	at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)

I think this is a bug and should be resolved in another ticket.

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments