[BUG-REPORT]: 'TypeError: Expected Array, got <class 'pyarrow.lib.ChunkedArray'>'


I have code in which I am using applyinPandas along witha a udf function that processes two dataframes, one on which the groupby is applied and another is passed as paramter to the udf function.

Now whenever, I run the function for a smaller dataset let's say for around ~200k record - it runs smoothly finished within an hours.

But when the data size increases to >800k records - It throws following error

**Error Description**

"Caused by: org.apache.spark.api.python.PythonException: 'TypeError: Expected Array, got <class 'pyarrow.lib.ChunkedArray'>'. Full traceback below: Traceback (most recent call last):   File "pyarrow/array.pxi", line 2377, in pyarrow.lib.StructArray.from_arrays TypeError: Expected Array, got <class 'pyarrow.lib.ChunkedArray'>"
**Description**
Please provide a clear and concise description of the problem. This should contain all the steps needed to reproduce the problem. A minimal code example that exposes the problem is very appreciated.

I am running the above code on Databricks

**Software information**
 - Databricks (15.4 ML, included Apache Spark 3.5.0)
 - Python version - 3.11


**Additional information**

Below are the spark configurations in the cluster

spark.databricks.service.server.enabled true
spark.databricks.service.port 15001
spark.databricks.delta.preview.enabled true

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG-REPORT]: 'TypeError: Expected Array, got <class 'pyarrow.lib.ChunkedArray'>' #2457

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG-REPORT]: 'TypeError: Expected Array, got <class 'pyarrow.lib.ChunkedArray'>' #2457

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions