Description
Parquet schema discovery will fail when the dir is like
/partitions5k/i=2/_SUCCESS /partitions5k/i=2/_temporary/ /partitions5k/i=2/part-r-00001.gz.parquet /partitions5k/i=2/part-r-00002.gz.parquet /partitions5k/i=2/part-r-00003.gz.parquet /partitions5k/i=2/part-r-00004.gz.parquet
java.lang.AssertionError: assertion failed: Conflicting partition column names detected:
at scala.Predef$.assert(Predef.scala:179)
at org.apache.spark.sql.sources.PartitioningUtils$.resolvePartitions(PartitioningUtils.scala:159)
at org.apache.spark.sql.sources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:71)
at org.apache.spark.sql.sources.HadoopFsRelation.org$apache$spark$sql$sources$HadoopFsRelation$$discoverPartitions(interfaces.scala:468)
at org.apache.spark.sql.sources.HadoopFsRelation$$anonfun$partitionSpec$3.apply(interfaces.scala:424)
at org.apache.spark.sql.sources.HadoopFsRelation$$anonfun$partitionSpec$3.apply(interfaces.scala:423)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.sources.HadoopFsRelation.partitionSpec(interfaces.scala:422)
at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:482)
at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:480)
at org.apache.spark.sql.sources.LogicalRelation.<init>(LogicalRelation.scala:30)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:134)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:118)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1135)
1.3 works fine.
Attachments
Issue Links
- links to