-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[CALCITE-7393] Support RelDataTypeDigest #4761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
604cf22 to
04538b2
Compare
|
Thank you for your contribution! I have a few suggestions: |
| /** | ||
| * Digest of a RelDataType. | ||
| */ | ||
| public interface RelDataTypeDigest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be two interfaces, in case Digest is useful somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the hashCode caching in RelDigest and RexCall already addresses some of the hashCode caching/comparison latency issues. Could you please clarify what you mean by “be two interfaces”?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interface HasDigestString {
String getDigestString()
}
interface RelDataTypeDigest extends HasDigestString {
RelDataType getType();
}
Other classes may implement HasDigestString.
| } | ||
|
|
||
| @Override public boolean equals(@Nullable Object obj) { | ||
| @Override public boolean deepEquals(Object obj) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll need to leave the old implementations around too, they may break users who expect them to exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we allow user to implement equals/hashCode for nested type, cache for hashCode and pointer compare for deepEquals will not work for them.
This situation is similiar to https://issues.apache.org/jira/browse/CALCITE-3786
|
|
||
| protected final @Nullable List<RelDataTypeField> fieldList; | ||
| protected @Nullable String digest; | ||
| protected RelDataTypeDigest digest; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing protected fields is a breaking change, which has to be reflected in the semantic versioning. This would require a 2.0 release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, I don't know what else you can do to make the digest lazy. Maybe other people can weigh on this dilemma. This is why you should probably file a JIRA issue just for this topic and discuss the design there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can allow user to set String digest to define their own digest. otherwise, we use the RelDataTypeDigest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is that some protected/public variables and methods must still be guaranteed to work properly. For example, this
"String digest" might eventually call
"toString" from
"RelDataTypeDigest" to get the value? Because these variables are exposed to users, and someone might currently be using them, we need to provide a compatibility fix. If you think reassigning the original variable would go against your design intention (e.g., it might increase memory usage instead of reducing it), then you need to declare that this is a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiedeyantu Thanks, it should be a breaking change. I commented in the commit message.
04538b2 to
a03c48a
Compare
1f17a42 to
955f537
Compare
955f537 to
f33c6a2
Compare
592f23b to
ac7bf74
Compare
use RelDataTypeDigest to reduce composite type's digest memory and digest relative operation's latency This is a break change, computeDigest will not set the string digest field, and innerDigest will be initialized by default. We recommend users use innerDigest to replace legacy string digest to archive better memory/latency for nested types. For users rely on set String digest for UDT, we keep the old behavior of hashCode/equals.
ac7bf74 to
078f2e4
Compare
|



Use RelDataTypeDigest to reduce composite type's digest memory and digest relative operation's latency.
test case: HepPlannerTest.testLargeTypeDigest
This is the first part of latency/memory optimization of large plan.