[SPARK-4523] Improve handling of serialized schema information - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: SQL
Labels:
None

Target Version/s:

1.4.0

Description

There are several issues with our current handling of metadata serialization, which is especially troublesome since this is the only place that we persist information directly using Spark SQL. Moving forward we should do the following:

Relax the parsing so that it does not fail when optional fields are missing (i.e. containsNull or metadata)
Include a regression suite that attempts to read old parquet files written by previous versions of Spark SQL.
Provide better warning messages when various forms of parsing fail (I think that it is silent right now which makes tracking down bugs more difficult than it needs to be).
Deprecate (display a warning) when reading data with the old case class schema representation and eventually remove it.

Attachments

Issue Links

relates to

SPARK-4522 Failure to read parquet schema with missing metadata.

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Michael Armbrust

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 20/Nov/14 23:25

Updated:: 18/May/15 20:01

Resolved:: 18/May/15 20:01