Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8821

Dataload for remote clusters should use recover partitions

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • Impala 3.3.0
    • Not Applicable
    • Infrastructure
    • None
    • ghx-label-2

    Description

      Some test setups have data already in place and only need to run the DDLs to sync up the metadata. This corresponds to running testdata/bin/create-load-data.sh using a data snapshot but without skip_metadata_load.

      Right now, for partitioned tables where the partitions are created dynamically as part of the insert, generate-schema-statements.py forces a reload:

      # Force reloading of the table if the user specified the --force option or
      # if the table is partitioned and there was no ALTER section specified. This is to
      # ensure the partition metadata is always properly created. The ALTER section is
      # used to create partitions, so if that section exists there is no need to force
      # reload.
      # IMPALA-6579: Also force reload all Kudu tables. The Kudu entity referenced
      # by the table may or may not exist, so requiring a force reload guarantees
      # that the Kudu entity is always created correctly.
      # TODO: Rename the ALTER section to ALTER_TABLE_ADD_PARTITION
      force_reload = options.force_reload or (partition_columns and not alter) or \
          file_format == 'kudu'

      In the case where the data is already in place, this would drop that data and reload it. Instead, we should just use "recover partitions" on that table to get all the partition information.

      Attachments

        Activity

          People

            Unassigned Unassigned
            joemcdonnell Joe McDonnell
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: