Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
For MOR tables that have these 2 configurations enabled:
hoodie.schema.on.read.enable=true hoodie.datasource.read.extract.partition.values.from.path=true
BaseFileReader will use a requiredSchemaReader when reading some of the parquet files. This BaseFileReader will have an empty internalSchemaStr causing Spark3XLegacyHoodieParquetInputFormat to fall back to OOB schema evolution.
Although there are required safeguards that are added in HUDI-5400 to force the code execution path to use Hudi Full Schema Evolution, we should still fix this so that future changes that may deprecate the use of Spark3XLegacyHoodieParquetInputFormat will not cause issues.
A sample test to invoke this:
test("Test wrong fallback to OOB schema evolution") { withRecordType()(withTempDir { tmp => Seq("mor").foreach { tableType => val tableName = generateTableName val tablePath = s"${new Path(tmp.getCanonicalPath, tableName).toUri.toString}" if (HoodieSparkUtils.gteqSpark3_1) { spark.sql("set " + SPARK_SQL_INSERT_INTO_OPERATION.key + "=upsert") spark.sql("set hoodie.schema.on.read.enable=true") spark.sql("hoodie.datasource.read.extract.partition.values.from.path=true") // NOTE: This is required since as this tests use type coercions which were only permitted in Spark 2.x // and are disallowed now by default in Spark 3.x spark.sql("set spark.sql.storeAssignmentPolicy=legacy") createAndPreparePartitionTable(spark, tableName, tablePath, tableType) // date -> string spark.sql(s"alter table $tableName alter column col6 type String") checkAnswer(spark.sql(s"select col6 from $tableName where id = 1").collect())( Seq("2021-12-25") ) } } }) }
Debugger snapshots:
As can be seen, requiredSchema (used as pruning input) has internalSchema string, but requiredDataSchema does has a null internalSchema string.
As a result, the internalSchemaStr that is passed into Spark3XLegacyHoodieParquetFileFormat is null (which should not be the case)
Attachments
Attachments
Issue Links
- links to