Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-522

Allow shorthand in CREATE TABLE for CSV and TSV

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Won't Do
    • Affects Version/s: Impala 1.1, Impala 2.5.0
    • Fix Version/s: None
    • Component/s: Frontend
    • Labels:
      None

      Description

      Currently, the precise way to create a CSV or TSV table is to specify:

      row format delimited fields terminated by ',' stored as textfile
      row format delimited fields terminated by '\t' stored as textfile

      Some parts are optional (STORED AS TEXTFILE), others are redundant (is there any TEXTFILE that is not ROW FORMAT DELIMITED? could you have FIELDS TERMINATED BY without ROW FORMAT DELIMITED?) or confusing (could STORED AS or FIELDS TERMINATED BY go ahead of ROW FORMAT?).

      Since the 3 clauses STORED AS TEXTFILE, ROW FORMAT DELIMITED, and FIELDS TERMINATED BY always go hand-in-hand, how about some shorthand notation for the most common cases, for example:

      STORED AS CSV
      STORED AS TSV

      to allow creating tables for comma-separated and tab-separated data without the verbose and redundant clauses?

      I did a little research online and all the other common separators do not have well-known acronyms. Sometimes space is a separator, but that is a recipe for trouble with most kinds of string data. Pipe is popular, but there is no tradition of a .psv file extension or PSV acronym.* Colon is sometimes used, but any acronym would conflict with CSV for comma.

      Cf: http://en.wikipedia.org/wiki/Delimiter-separated_values

      • Now that I've had some more experience with choosing delimiter characters, I'm voting for STORED AS PSV to make a table with pipe-separated values. Sometimes you need a character that's visible, so tab won't do. So pipe is the logical fallback choice if the data contains any commas.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jrussell John Russell
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: