-
Notifications
You must be signed in to change notification settings - Fork 3k
Spark: Fix CREATE VIEW IF NOT EXISTS failure when non-Iceberg view exists in SparkSessionCatalog #14930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Spark: Fix CREATE VIEW IF NOT EXISTS failure when non-Iceberg view exists in SparkSessionCatalog #14930
Conversation
…ists in SparkSessionCatalog
|
@huaxingao The previous PR was automatically closed due to a force push, so I’ve opened a new one. |
| import org.junit.jupiter.api.BeforeAll; | ||
| import org.junit.jupiter.api.Test; | ||
|
|
||
| public class TestSparkSessionCatalogWithExtensions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to first fix the issue in one Spark version and later backport stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to first fix 4.1 and then back-porting
| import org.junit.jupiter.api.BeforeAll; | ||
| import org.junit.jupiter.api.Test; | ||
|
|
||
| public class TestSparkSessionCatalogWithExtensions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to first fix 4.1 and then back-porting
| } | ||
| } | ||
|
|
||
| public static void setUpCatalog() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: private?
| spark.conf().set("spark.sql.catalog.spark_catalog.type", "hive"); | ||
| } | ||
|
|
||
| public static void resetSparkCatalog() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: private?
| protected static TestHiveMetastore metastore = null; | ||
| protected static HiveConf hiveConf = null; | ||
| protected static SparkSession spark = null; | ||
| protected static JavaSparkContext sparkContext = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this necessary? If not, can we remove?
| spark | ||
| .conf() | ||
| .set("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog"); | ||
| spark.conf().set("spark.sql.catalog.spark_catalog.type", "hive"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add spark.sessionState().catalogManager().reset() when flipping these configs (either inside the helper methods or immediately after calling them in the tests), similar to how spark/v4.1/spark/src/test/java/org/apache/iceberg/spark/TestSparkSessionCatalog.java does it?
This is a follow-up PR. The previous PR was closed after the branch was force-reset to
apache:main.Purpose
This PR fixes a bug where
CREATE VIEW IF NOT EXISTSfails with aNoSuchIcebergViewException: Not an iceberg view(wrapped inQueryExecutionException) instead of succeeding silently when a non-Iceberg view (e.g., a Hive view) already exists in theSparkSessionCatalog.The Problem
When
SparkSessionCatalogis configured withspark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensionsspark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalogspark.sql.catalog.spark_catalog.type=hiveCREATE VIEW IF NOT EXISTS db.view_name AS ....db.view_namealready exists as a Hive View (or any non-Iceberg table/view).SparkSessionCatalog.createViewcurrently delegates directly to the underlying Iceberg catalog (asViewCatalog.createView).NoSuchIcebergViewException.ViewAlreadyExistsExceptionto handle theIF NOT EXISTSlogic. Because it receives a different exception, the query fails entirely.The Fix
Before delegating the creation to the Iceberg catalog, we explicitly check if the identifier already exists in the underlying session catalog (which is the source of truth for the global namespace).
If
getSessionCatalog().tableExists(ident)returns true, we immediately throwViewAlreadyExistsException. This allows Spark's analysis rules to correctly catch the exception and ignore the operation as perIF NOT EXISTSsemantics.Verification
TestSparkSessionCatalogto verify thatCREATE VIEW IF NOT EXISTSsucceeds when a Hive view exists.CREATE VIEW(without if not exists) correctly throwsAnalysisException(Table or view already exists).