Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6386

Dataload can fail due to "invalidate metadata" concurrent with DDLs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.11.0
    • Impala 2.12.0
    • Infrastructure
    • None

    Description

      testdata/bin/create-load-data.sh runs bin/load-data.py on TPC-H, TPC-DS, and functional-query in parallel. One of the final steps of bin/load-data.py is to run a universal "invalidate metadata". However, universal "invalidate metadata" is an error-prone operation in a concurrent system. When "invalidate metadata" happens during the DDL statements for another dataset (i.e. TPC-H finishes and runs "invalidate metadata" while TPC-DS is still creating tables and adding partitions), it can lead to errors.

      Thread 1: create external table foo ... ;
      Thread 2: invalidate metadata;
      Thread 1: alter table foo add partition bar; <-- Hits error because it can't find foo

      This is a known issue: IMPALA-5087. This has been seen in my development environment and one automated build, but it is relatively rare.

      Dataload needs to switch to using "invalidate metadata

      {table_name}" to avoid this issue. This is also a good time to consider using "refresh {table_name}

      ".

      Attachments

        Issue Links

          Activity

            People

              joemcdonnell Joe McDonnell
              joemcdonnell Joe McDonnell
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: