Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4534

Not all of the data load files follow the accepted format when partitioning test data tables

    XMLWordPrintableJSON

Details

    Description

      When loading Impala test data, we "generally" partition tables in our data load process by adding PARTITION_COLUMNS and ALTER sections to the schema template files, e.g. from functional_schema_template.sql:

      ---- DATASET
      functional
      ---- BASE_TABLE_NAME
      alltypessmall
      ---- COLUMNS
      id int
      bool_col boolean
      tinyint_col tinyint
      smallint_col smallint
      int_col int
      bigint_col bigint
      float_col float
      double_col double
      date_string_col string
      string_col string
      timestamp_col timestamp
      ---- PARTITION_COLUMNS
      year int
      month int
      ---- ALTER
      ALTER TABLE {table_name} ADD IF NOT EXISTS PARTITION(year=2009, month=1);
      ALTER TABLE {table_name} ADD IF NOT EXISTS PARTITION(year=2009, month=2);
      ALTER TABLE {table_name} ADD IF NOT EXISTS PARTITION(year=2009, month=3);
      ALTER TABLE {table_name} ADD IF NOT EXISTS PARTITION(year=2009, month=4);
      

      However, some tables forego this, and combine the PARTITION BY clause with the CREATE TABLE clause, and they may or may not include an ALTER section. This sidesteps logic in generate-schema-statements.py that specifically branches based upon whether the PARTITION_COLUMNS and/or ALTER sections have been defined.

      We should investigate what effect the omission of these sections has on our data load process for those tables.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dknupp David Knupp
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: