Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5952

Query waiting indefinitely for table metadata to arrive

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Not A Bug
    • Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
    • None
    • Catalog, Frontend
    • ghx-label-1

    Description

      Impala queries may hang indefinitely while waiting for the metadata of a deleted table to arrive through a statestore topic update. You will see many messages like this in the log of the impalad coordinating the hung query:

      Missing tables were not received in 120000ms. Load request will be retried. <list of tables>
      

      If one of the tables mentioned in those log messates has been deleted, then you may be hitting this issue.

      This code in Frontend#getMissingTbls() clearly shows the bug:

        private Set<TableName> getMissingTbls(Set<TableName> tableNames) {
          Set<TableName> missingTbls = new HashSet<TableName>();
          for (TableName tblName: tableNames) {
            Db db = getCatalog().getDb(tblName.getDb());
            if (db == null) continue; <--- wrong! database has been dropped and may never arrive
            Table tbl = db.getTable(tblName.getTbl());
            if (tbl == null) continue; <--- wrong! table has been dropped and may never arrive
            if (!tbl.isLoaded()) missingTbls.add(tblName);
          }
          return missingTbls;
        }
      

      Getting into this hung state requires an elaborate series of events, for example:

      • impalad A requests table T to be loaded and gets into the wait loop
      • impalad B issues a "DROP TABLE T"
      • catalogd loads the metadata for table T
      • statestored requests topic update from catalogd; update includes T
      • statestored sends update to impalad B
      • impalad B completes the "DROP TABLE T" operation
      • statestored requests topic update from catalogd; update includes deletion of T
      • statestored sends update to impalad A which includes the deletion of table T
      • impalad A is still in the wait loop; the metadata for T will never arrive because T has been dropped

      Notice how impalad A may "skip" the first update for T which includes the metadata for T. This typically only happens on very busy clusters where the statestore has trouble sending all catalog snapshots to all subscribers in a timely fashion (i.e. some subscribers skip some snapshots).

      Workaround

      • Re-create tables with the same name as the deleted ones (schema and format do not matter, only the dabatase and table name must match)
      • Might need to run "invalidate metadata <table>" on them
      • Once the hung queries finished (failed or succeeded), the re-created tables can be dropped again

      Attachments

        Activity

          People

            alex.behm Alexander Behm
            alex.behm Alexander Behm
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: