Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6386

Dataload can fail due to "invalidate metadata" concurrent with DDLs

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.11.0
    • Fix Version/s: Impala 2.12.0
    • Component/s: Infrastructure
    • Labels:
      None

      Description

      testdata/bin/create-load-data.sh runs bin/load-data.py on TPC-H, TPC-DS, and functional-query in parallel. One of the final steps of bin/load-data.py is to run a universal "invalidate metadata". However, universal "invalidate metadata" is an error-prone operation in a concurrent system. When "invalidate metadata" happens during the DDL statements for another dataset (i.e. TPC-H finishes and runs "invalidate metadata" while TPC-DS is still creating tables and adding partitions), it can lead to errors.

      Thread 1: create external table foo ... ;
      Thread 2: invalidate metadata;
      Thread 1: alter table foo add partition bar; <-- Hits error because it can't find foo

      This is a known issue: IMPALA-5087. This has been seen in my development environment and one automated build, but it is relatively rare.

      Dataload needs to switch to using "invalidate metadata

      {table_name}" to avoid this issue. This is also a good time to consider using "refresh {table_name}

      ".

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                joemcdonnell Joe McDonnell
                Reporter:
                joemcdonnell Joe McDonnell
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: