Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Invalid
-
2.0.0
-
None
-
None
-
mac osx 10.11.6, ubuntu 14, ubuntu 16.
spark 2.0.0, spark-catalyst 2.0.0
Description
Spark added '_OPTIONAL' metadata in 2.0.0 in following commit: https://github.com/apache/spark/commit/4637fc08a3733ec313218fb7e4d05064d9a6262d
but merging metadata for data created from spark 1.6.x and 2.0 fails with following:
Exception in thread "main" java.lang.RuntimeException: could not merge metadata: key org.apache.spark.sql.parquet.row.metadata has conflicting values:
and the only difference in those values is metadata now having "OPTIONAL" field extra.
{ { "name": "catalog", "name": "catalog", "type": { "type": { "type": "struct", "type": "struct", "fields": [ "fields": [ { { "name": "category", "name": "category", "type": "string", "type": "string", "nullable": true, "nullable": true, "metadata": {} "metadata": {} }, }, { { "name": "department", "name": "department", "type": "string", "type": "string", "nullable": true, "nullable": true, "metadata": {} "metadata": {} } } ] ] }, }, "nullable": true, "nullable": true, "metadata": { "metadata": {} "_OPTIONAL_": true }
vs
{ "name": "catalog", "name": "catalog", "type": { "type": { "type": "struct", "type": "struct", "fields": [ "fields": [ { { "name": "category", "name": "category", "type": "string", "type": "string", "nullable": true, "nullable": true, "metadata": {} "metadata": {} }, }, { { "name": "department", "name": "department", "type": "string", "type": "string", "nullable": true, "nullable": true, "metadata": {} "metadata": {} } } ] ] }, }, "nullable": true, "nullable": true, "metadata": { "metadata": {}