Details
-
Sub-task
-
Status: Closed
-
Critical
-
Resolution: Won't Fix
-
2.1.0
-
None
-
None
Description
In Spark 2.1, if we create a parititoned data source table given a specified path, it returns nothing when we try to query it. To get the data, we have to manually issue a DDL to repair the table.
In Spark 2.0, it can return the data stored in the specified path, without repairing the table.
Below is the output of Spark 2.1.
scala> spark.range(5).selectExpr("id as fieldOne", "id as partCol").write.partitionBy("partCol").mode("overwrite").saveAsTable("test") [Stage 0:======================> (3 + 5) / 8]SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. scala> spark.sql("select * from test").show() +--------+-------+ |fieldOne|partCol| +--------+-------+ | 0| 0| | 1| 1| | 2| 2| | 3| 3| | 4| 4| +--------+-------+ scala> spark.sql("desc formatted test").show(50, false) +----------------------------+----------------------------------------------------------------------+-------+ |col_name |data_type |comment| +----------------------------+----------------------------------------------------------------------+-------+ |fieldOne |bigint |null | |partCol |bigint |null | |# Partition Information | | | |# col_name |data_type |comment| |partCol |bigint |null | | | | | |# Detailed Table Information| | | |Database: |default | | |Owner: |xiaoli | | |Create Time: |Sat Dec 17 17:46:24 PST 2016 | | |Last Access Time: |Wed Dec 31 16:00:00 PST 1969 | | |Location: |file:/Users/xiaoli/IdeaProjects/sparkDelivery/bin/spark-warehouse/test| | |Table Type: |MANAGED | | |Table Parameters: | | | | transient_lastDdlTime |1482025584 | | | | | | |# Storage Information | | | |SerDe Library: |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | | |InputFormat: |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | | |OutputFormat: |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | | |Compressed: |No | | |Storage Desc Parameters: | | | | serialization.format |1 | | |Partition Provider: |Catalog | | +----------------------------+----------------------------------------------------------------------+-------+ scala> spark.sql(s"create table newTab (fieldOne long, partCol int) using parquet options (path 'file:/Users/xiaoli/IdeaProjects/sparkDelivery/bin/spark-warehouse/test') partitioned by (partCol)") res3: org.apache.spark.sql.DataFrame = [] scala> spark.table("newTab").show() +--------+-------+ |fieldOne|partCol| +--------+-------+ +--------+-------+