[IMPALA-522] Allow shorthand in CREATE TABLE for CSV and TSV - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Trivial
Resolution: Won't Do
Affects Version/s: Impala 1.1, Impala 2.5.0
Fix Version/s: None
Component/s: Frontend
Labels:
None

Target Version:

Product Backlog

Description

Currently, the precise way to create a CSV or TSV table is to specify:

row format delimited fields terminated by ',' stored as textfile
row format delimited fields terminated by '\t' stored as textfile

Some parts are optional (STORED AS TEXTFILE), others are redundant (is there any TEXTFILE that is not ROW FORMAT DELIMITED? could you have FIELDS TERMINATED BY without ROW FORMAT DELIMITED?) or confusing (could STORED AS or FIELDS TERMINATED BY go ahead of ROW FORMAT?).

Since the 3 clauses STORED AS TEXTFILE, ROW FORMAT DELIMITED, and FIELDS TERMINATED BY always go hand-in-hand, how about some shorthand notation for the most common cases, for example:

STORED AS CSV
STORED AS TSV

to allow creating tables for comma-separated and tab-separated data without the verbose and redundant clauses?

I did a little research online and all the other common separators do not have well-known acronyms. Sometimes space is a separator, but that is a recipe for trouble with most kinds of string data. Pipe is popular, but there is no tradition of a .psv file extension or PSV acronym.* Colon is sometimes used, but any acronym would conflict with CSV for comma.

Cf: http://en.wikipedia.org/wiki/Delimiter-separated_values

Now that I've had some more experience with choosing delimiter characters, I'm voting for STORED AS PSV to make a table with pipe-separated values. Sometimes you need a character that's visible, so tab won't do. So pipe is the logical fallback choice if the data contains any commas.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: John Russell

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 08/Aug/13 19:00

Updated:: 12/Apr/17 00:26

Resolved:: 12/Apr/17 00:26