Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2210

Make Parquet the default file format

    XMLWordPrintableJSON

Details

    Description

      I expect that by far the most common use case for CREATE TABLE LIKE PARQUET is to make a table where the specified Parquet file will be queried. That is, either:

      CREATE TABLE foo LIKE PARQUET '/blah/blah/file.parq' STORED AS PARQUET;
      LOAD DATA INFILE '/blah/blah/file.parq' INTO TABLE foo;

      or

      CREATE EXTERNAL TABLE foo LIKE PARQUET '/blah/blah/file.parq' STORED AS PARQUET LOCATION '/blah/blah';

      I have difficulty imagining a case where someone would do CREATE TABLE LIKE PARQUET and want the result to be a text table. Even if someone planned to convert Parquet -> text, they would need to have a Parquet table to begin with, in which case they would do CREATE TABLE text_table LIKE parquet_table, not CREATE TABLE LIKE PARQUET.

      It is easy to leave off the STORED AS PARQUET clause by mistake from a CTLP statement, because PARQUET already occurs earlier in the statement, resulting in a text table that throws conversion errors when queried. How about making Parquet the default format in this case, and requiring the STORED AS clause only to use a different file format? (Then if Impala implemented a CREATE TABLE LIKE AVRO syntax, the default in that case would be Avro.)

      Since I guess this would qualify as an incompatible change, we would need to think through the appropriate release vehicle.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jrussell John Russell
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: