[SPARK-13766] Inconsistent file extensions and omitted file extensions written by CSV, TEXT and JSON data sources - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 2.0.0
Component/s: SQL
Labels:
None

Description

Currently, the output (part-files) from CSV, TEXT and JSON data sources do not have file extensions such as .csv, .txt and .json (except for compression extensions such as .gz, .deflate and .bz4).

In addition, it looks Parquet has the extensions (in part-files) such as .gz.parquet or .snappy.parquet according to compression codecs whereas ORC does not have such extensions but it is just .orc.

So, in a simple view, currently the extensions are set as below:

TEXT, CSV and JSON - [.COMPRESSION_CODEC_NAME]
Parquet -  [.COMPRESSION_CODEC_NAME].parquet
ORC - .orc

It would be great if we have a consistent naming for them

Attachments

Issue Links

links to

[Github] Pull Request #11604 (HyukjinKwon)

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 09/Mar/16 05:55

Updated:: 12/Dec/22 18:11

Resolved:: 10/Mar/16 03:12