Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2210

Consider making Parquet the default file format for CREATE TABLE LIKE PARQUET

    Details

      Description

      I expect that by far the most common use case for CREATE TABLE LIKE PARQUET is to make a table where the specified Parquet file will be queried. That is, either:

      CREATE TABLE foo LIKE PARQUET '/blah/blah/file.parq' STORED AS PARQUET;
      LOAD DATA INFILE '/blah/blah/file.parq' INTO TABLE foo;

      or

      CREATE EXTERNAL TABLE foo LIKE PARQUET '/blah/blah/file.parq' STORED AS PARQUET LOCATION '/blah/blah';

      I have difficulty imagining a case where someone would do CREATE TABLE LIKE PARQUET and want the result to be a text table. Even if someone planned to convert Parquet -> text, they would need to have a Parquet table to begin with, in which case they would do CREATE TABLE text_table LIKE parquet_table, not CREATE TABLE LIKE PARQUET.

      It is easy to leave off the STORED AS PARQUET clause by mistake from a CTLP statement, because PARQUET already occurs earlier in the statement, resulting in a text table that throws conversion errors when queried. How about making Parquet the default format in this case, and requiring the STORED AS clause only to use a different file format? (Then if Impala implemented a CREATE TABLE LIKE AVRO syntax, the default in that case would be Avro.)

      Since I guess this would qualify as an incompatible change, we would need to think through the appropriate release vehicle.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jrussell John Russell
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: