Conversation
a774ead to
445de19
Compare
445de19 to
16588b8
Compare
| assert_snapshot!(distributed_plan_str, @r" | ||
| DDStageExec[0] (output_partitioning=UnknownPartitioning(1)) | ||
| ProjectionExec: expr=[1 as test_col] | ||
| DataSourceExec: partitions=1, partition_sizes=[1] |
There was a problem hiding this comment.
I noticed that even though we set datafusion.execution.target_partitions = 3 in make_state(), the plan still shows partitions = 1 in all cases. If this changes, we could force 1 partition explicitly for tests in the future.
There was a problem hiding this comment.
Maybe because there are not enough data for the plan to make make 3 partitions? This is actually anther good case for us to test. I think we can either create a mock test and data to enforce the number of partitions or use larger data set. I think the mock test is usually the choice to avoid adding more data
There was a problem hiding this comment.
Also, I think we will start having a lot of things to do soon If you cannot work on this test next, it is good to create a ticket to remind us to work on it
There was a problem hiding this comment.
I was planning on looking into integration tests with mocking after merging the planning tests, unless there was something more urgent to do. I'll create a ticket to track it still
NGA-TRAN
left a comment
There was a problem hiding this comment.
Nice.
I can't wait to see tests that verify actual results and more sophisticated plans with many number of partitions distributed to different workers
| assert_snapshot!(distributed_plan_str, @r" | ||
| DDStageExec[0] (output_partitioning=UnknownPartitioning(1)) | ||
| ProjectionExec: expr=[1 as test_col] | ||
| DataSourceExec: partitions=1, partition_sizes=[1] |
There was a problem hiding this comment.
Maybe because there are not enough data for the plan to make make 3 partitions? This is actually anther good case for us to test. I think we can either create a mock test and data to enforce the number of partitions or use larger data set. I think the mock test is usually the choice to avoid adding more data
| assert_snapshot!(distributed_plan_str, @r" | ||
| DDStageExec[0] (output_partitioning=UnknownPartitioning(1)) | ||
| ProjectionExec: expr=[1 as test_col] | ||
| DataSourceExec: partitions=1, partition_sizes=[1] |
There was a problem hiding this comment.
Also, I think we will start having a lot of things to do soon If you cannot work on this test next, it is good to create a ticket to remind us to work on it
| rand = "0.8" | ||
| uuid = { version = "1.6", features = ["v4"] } | ||
| serde_json = "1.0" | ||
| insta = { version = "1.43.1"} |
Adds some tests for distributed planning.