[SPARK-8928] CatalystSchemaConverter doesn't stick to behavior of old versions of Spark SQL when dealing with LISTs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.5.0
Fix Version/s: 1.5.0
Component/s: SQL
Labels:
None

Target Version/s:

1.5.0

Description

Spark SQL 1.4.x and prior versions doesn't follow Parquet format spec when dealing with complex types (because the spec wasn't clear about this part at the time Spark SQL Parquet support was authored). ~~SPARK-6777~~ partly fixes this problem with CatalystSchemaConverter and introduces a feature flag to indicate whether we should stick to Parquet format spec (standard mode) or older Spark versions behavior (compatible mode). However, when dealing with LISTs (i.e. Spark SQL arrays), CatalystSchemaConverter doesn't give exactly the same schema as before in compatible mode.

For a nullable int array, 1.4.x- sticks to parquet-avro and gives:

message root {
  <repetition> group <field-name> (LIST) {
    required int32 array;
  }
}

The inner most field name is array. However, CatalystSchemaConverter uses element.

Similarly, for a nullable int array, 1.4.x- sticks to parquet-hive, and gives:

optional group <field-name> (LIST) {
  optional group bag {
    optional int32 array_element;
  }
}

The inner most field name is array_element. However, CatalystSchemaConverter still uses element.

This issue doesn't affect Parquet read path since all the schemas above are covered by ~~SPARK-6776~~. But this issue affects the write path. Ideally, we'd like Spark SQL 1.5.x to write Parquet files sticking to either exactly the same format used in 1.4.x- or the most recent Parquet format spec.

Attachments

Issue Links

links to

[Github] Pull Request #7304 (liancheng)

[Github] Pull Request #7314 (liancheng)

Activity

People

Assignee:: Cheng Lian

Reporter:: Cheng Lian

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 08/Jul/15 23:42

Updated:: 09/Jul/15 05:18

Resolved:: 09/Jul/15 05:09