Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-522

Allow shorthand in CREATE TABLE for CSV and TSV

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Trivial
    • Resolution: Won't Do
    • Impala 1.1, Impala 2.5.0
    • None
    • Frontend
    • None

    Description

      Currently, the precise way to create a CSV or TSV table is to specify:

      row format delimited fields terminated by ',' stored as textfile
      row format delimited fields terminated by '\t' stored as textfile

      Some parts are optional (STORED AS TEXTFILE), others are redundant (is there any TEXTFILE that is not ROW FORMAT DELIMITED? could you have FIELDS TERMINATED BY without ROW FORMAT DELIMITED?) or confusing (could STORED AS or FIELDS TERMINATED BY go ahead of ROW FORMAT?).

      Since the 3 clauses STORED AS TEXTFILE, ROW FORMAT DELIMITED, and FIELDS TERMINATED BY always go hand-in-hand, how about some shorthand notation for the most common cases, for example:

      STORED AS CSV
      STORED AS TSV

      to allow creating tables for comma-separated and tab-separated data without the verbose and redundant clauses?

      I did a little research online and all the other common separators do not have well-known acronyms. Sometimes space is a separator, but that is a recipe for trouble with most kinds of string data. Pipe is popular, but there is no tradition of a .psv file extension or PSV acronym.* Colon is sometimes used, but any acronym would conflict with CSV for comma.

      Cf: http://en.wikipedia.org/wiki/Delimiter-separated_values

      • Now that I've had some more experience with choosing delimiter characters, I'm voting for STORED AS PSV to make a table with pipe-separated values. Sometimes you need a character that's visible, so tab won't do. So pipe is the logical fallback choice if the data contains any commas.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jrussell John Russell
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: