feat: Add support for explode and explode_outer for array inputs#2836
feat: Add support for explode and explode_outer for array inputs#2836andygrove merged 24 commits intoapache:mainfrom
explode and explode_outer for array inputs#2836Conversation
explode and explode_outer [WIP]explode and explode_outer
explode and explode_outerexplode and explode_outer for array inputs
|
Awesome stuff @andygrove! DataFusion has |
explode and explode_outer for array inputsexplode and explode_outer for array inputs [WIP]
Thanks! I am trying this out now. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2836 +/- ##
============================================
+ Coverage 56.12% 59.19% +3.06%
- Complexity 976 1473 +497
============================================
Files 119 167 +48
Lines 11743 15307 +3564
Branches 2251 2530 +279
============================================
+ Hits 6591 9061 +2470
- Misses 4012 4952 +940
- Partials 1140 1294 +154 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There does appear to be a different in the handling of empty arrays in in the |
I filed an issue in DataFusion: apache/datafusion#19053 |
explode and explode_outer for array inputs [WIP]explode and explode_outer for array inputs
comphead
left a comment
There was a problem hiding this comment.
Thanks @andygrove it mostly looks good
| @@ -0,0 +1,4 @@ | |||
| SELECT i_item_sk, explode(array(i_brand_id, i_class_id, i_category_id, i_manufact_id, i_manager_id)) | |||
There was a problem hiding this comment.
should we also have explode_outer?
| )); | ||
| }; | ||
|
|
||
| // Create projection expressions for other columns |
There was a problem hiding this comment.
yes, as in SELECT a, b, c, explode(d) FROM ...
native/core/src/execution/planner.rs
Outdated
| .collect(); | ||
|
|
||
| // Add the array column as the last column | ||
| let array_col_name = format!("col_{}", projections.len()); |
There was a problem hiding this comment.
can this name cause a conflict or issue if original dataset has col_* cols?
There was a problem hiding this comment.
I removed this now and preserve original names
| output_fields.push(Field::new( | ||
| array_field.name(), | ||
| element_type, | ||
| true, // Element is nullable after unnesting |
|
One Spark SQL test needs rewriting to work with Comet. I am working on it. |
comphead
left a comment
There was a problem hiding this comment.
Thanks @andygrove just one nit on: if we need to have a microbenchmark on explode_outer but it can be done in follow up PR
Thanks. I added this to the scope of #2838. |
|
The supported list is not updated |
Thanks @bjornjorgensen this is autogenerated doc thats being triggered manually, I'm actually thinking we can deprecate it |
|
ohh.. ok. I just have a look at this project and read "You may have a specific expression in mind that you’d like to add, but if not, you can review the expression coverage document to see which expressions are not yet supported." https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html |
Thanks for reporting this, the correct doc would be https://github.com/apache/datafusion-comet/blob/main/docs/spark_expressions_support.md I updated the manual #2854 |
Which issue does this PR close?
Closes #1927
Rationale for this change
explodeis widely used in Spark jobs that work with arrays.What changes are included in this PR?
Add support for
explodefor arrays, but not maps yet. I filed #2837 for adding support for maps.High level changes:
Explodeoperator in protobufCometExplodeExeccontains serde code for JVMUnnestExecCometGenerateExecSuiteHow are these changes tested?