Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.9.0
-
None
Description
With 0.9.0, we added support for dropping partition columns after generating the partition path for hoodie records. but looks like we have some gaps in end to end flow.
Main issue is with not giving glob path. if proper glob path is given, it works.
hudi serializes partition path fields to table properties and re-uses when in need. So, while querying the field may not be part of table schema since hudi would have removed the field (along with trimming the schema) completely.
Stacktrace for Issue (1)
scala> val tripsSnapshotDF = spark.read.format("hudi").load(basePath) java.lang.IllegalArgumentException: Cannot find column: 'partitionId' in the schema[StructField(_hoodie_commit_time,StringType,true),StructField(_hoodie_commit_seqno,StringType,true),StructField(_hoodie_record_key,StringType,true),StructField(_hoodie_partition_path,StringType,true),StructField(_hoodie_file_name,StringType,true),StructField(rowId,StringType,true),StructField(preComb,LongType,true),StructField(name,StringType,true),StructField(versionId,StringType,true),StructField(toBeDeletedStr,StringType,true),StructField(intToLong,IntegerType,true),StructField(longToInt,LongType,true)] at org.apache.hudi.HoodieFileIndex$$anonfun$5$$anonfun$apply$1.apply(HoodieFileIndex.scala:106) at org.apache.hudi.HoodieFileIndex$$anonfun$5$$anonfun$apply$1.apply(HoodieFileIndex.scala:106) at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) at scala.collection.AbstractMap.getOrElse(Map.scala:59) at org.apache.hudi.HoodieFileIndex$$anonfun$5.apply(HoodieFileIndex.scala:106) at org.apache.hudi.HoodieFileIndex$$anonfun$5.apply(HoodieFileIndex.scala:105) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.hudi.HoodieFileIndex._partitionSchemaFromProperties$lzycompute(HoodieFileIndex.scala:105) at org.apache.hudi.HoodieFileIndex._partitionSchemaFromProperties(HoodieFileIndex.scala:99) at org.apache.hudi.HoodieFileIndex.getAllQueryPartitionPaths(HoodieFileIndex.scala:348) at org.apache.hudi.HoodieFileIndex.loadPartitionPathFiles(HoodieFileIndex.scala:420) at org.apache.hudi.HoodieFileIndex.refresh0(HoodieFileIndex.scala:214) at org.apache.hudi.HoodieFileIndex.<init>(HoodieFileIndex.scala:149) at org.apache.hudi.DefaultSource.getBaseFileOnlyView(DefaultSource.scala:199) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:116) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:67) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) ... 63 elided
Steps to reproduce:
https://gist.github.com/nsivabalan/570a96004e41f84565c99d8994b12d57