Implemented MultiIndex.equal_levels#1789
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1789 +/- ##
==========================================
- Coverage 94.64% 94.61% -0.04%
==========================================
Files 49 49
Lines 10818 10724 -94
==========================================
- Hits 10239 10146 -93
+ Misses 579 578 -1
Continue to review full report at Codecov.
|
|
Could someone double check this just once more? Seems fine to me. |
|
I'll take a look later. |
|
Sure, thanks :) |
ueshin
left a comment
There was a problem hiding this comment.
I can see some issues:
>>> pmidx1 = pd.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z")])
>>> pmidx2 = pd.MultiIndex.from_tuples([("a", "y"), ("b", "x"), ("c", "z")])
>>> pmidx1.equal_levels(pmidx2)
True
>>> kmidx1 = ks.from_pandas(pmidx1)
>>> kmidx2 = ks.from_pandas(pmidx2)
>>> kmidx1.equal_levels(kmidx2)
Falseor
>>> pmidx1 = pd.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z"), ("a", "y")])
>>> pmidx2 = pd.MultiIndex.from_tuples([("a", "y"), ("b", "x"), ("c", "z"), ("c", "x")])
>>> pmidx1.equal_levels(pmidx2)
True
>>> kmidx1 = ks.from_pandas(pmidx1)
>>> kmidx2 = ks.from_pandas(pmidx2)
>>> kmidx1.equal_levels(kmidx2)
False| return False | ||
| self_frame = self.sort_values().to_frame() | ||
| other_frame = other.sort_values().to_frame() | ||
| with option_context("compute.ops_on_diff_frames", True): |
There was a problem hiding this comment.
We might avoid force enabling compute.ops_on_diff_frames. let's see.
There was a problem hiding this comment.
Okay, thanks for the review! Let me resolve the comments
### What changes were proposed in this pull request?
This PR proposes implementing `MultiIndex.equal_levels`.
```python
>>> psmidx1 = ps.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z")])
>>> psmidx2 = ps.MultiIndex.from_tuples([("b", "y"), ("a", "x"), ("c", "z")])
>>> psmidx1.equal_levels(psmidx2)
True
>>> psmidx1 = ps.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z"), ("a", "y")])
>>> psmidx2 = ps.MultiIndex.from_tuples([("a", "y"), ("b", "x"), ("c", "z"), ("c", "x")])
>>> psmidx1.equal_levels(psmidx2)
True
```
This was originally proposed in databricks/koalas#1789, and all reviews in origin PR has been resolved.
### Why are the changes needed?
We should support the pandas API as much as possible for pandas-on-Spark module.
### Does this PR introduce _any_ user-facing change?
Yes, the `MultiIndex.equal_levels` API is available.
### How was this patch tested?
Unittests
Closes #34113 from itholic/SPARK-36435.
Lead-authored-by: itholic <haejoon.lee@databricks.com>
Co-authored-by: Haejoon Lee <44108233+itholic@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
This PR proposes
MultiIndex.equal_levels.