[SPARK-18760] Provide consistent format output for all file formats - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.1.0
Component/s: SQL
Labels:
None

Target Version/s:

2.2.0

Description

We currently rely on FileFormat implementations to override toString in order to get a proper explain output. It'd be better to just depend on shortName for those.

Before:

scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: org.apache.spark.sql.execution.datasources.text.TextFileFormat@xyz, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>

After:

scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: text, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>

Attachments

Issue Links

duplicates

SPARK-17101 Provide consistent format identifiers for TextFileFormat and ParquetFileFormat

Resolved

links to

[Github] Pull Request #16187 (rxin)

Activity

People

Assignee:: Reynold Xin

Reporter:: Reynold Xin

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Dec/16 05:21

Updated:: 15/Dec/16 05:05

Resolved:: 08/Dec/16 20:52