Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4799

Long running metadata load for large tables blocks queries/loading all other tables for long time

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • Impala 2.7.0
    • None
    • Catalog
    • None
    • Version: 2.7.0-cdh5.9.0
      OS: Centos.6.6

    Description

      If you have some big tables history.big_table_with_many_partitions and do a refresh it may take a long time.
      But this seems to block loading metadata for all other tables. That shouldn't be the case.
      (I am guess it's something to do with ensuring correct version of catalog topic update ? )

      For example the follow sequence can be used to duplicate the issue:

      impala-shell -k -i hdp-dn01 -q "invalidate metadata history.empty_table;" # invalidate metadata to force PriorotizeLoad of table next time it's queried
      impala-shell -k -i hdp-dn02 -q "refresh history.big_table_with_many_partitions;" # from different node
      impala-shell -k -i hdp-dn01 -q "select count(1) from history.empty_table;" # this always finishes a second or two after refresh history.big... finishes - which maybe minutes - even if the table is empty. - this query is started a second or so after the previous one and it always finishes after the previous one.

      See attached logs (see /tmp/catalogd-invalidate-metadata-example-logs.log) : in that case : empty_table is "history.sa_issue_rating" and big_table_with_many_partitions is "history.bundle"
      You may notice in teh logs multiple Publish updates after the refresh (ResetMetadata for history bundle (big_table_with_many_partitions) ) finishes

      Absolutely the same thing happens with drop table statement:

      impala-shell -k -i hdp-dn02 -q "refresh history.big_table_with_many_partitions;" # from different node
      impala-shell -k -i hdp-dn01 -q "drop table default.test_table;" # this always finishes a second or two after refresh history.big... finishes - which maybe minutes - even if the table is empty. - this query is started a second or so after the previous one

      Please see impalad-drop-test-table-logs.log and /tmp/catalogd-drop-test-table.log for logs about this.

      Attachments

        1. test-catalogd.threads.post2
          55 kB
          Antoni
        2. test-catalogd.threads.post1
          50 kB
          Antoni
        3. test-catalogd.threads.baseline
          49 kB
          Antoni
        4. test-catalogd.threads.7
          51 kB
          Antoni
        5. test-catalogd.threads.6
          61 kB
          Antoni
        6. test-catalogd.threads.5
          50 kB
          Antoni
        7. test-catalogd.threads.4
          55 kB
          Antoni
        8. test-catalogd.threads.3
          54 kB
          Antoni
        9. test-catalogd.threads.2
          52 kB
          Antoni
        10. test-catalogd.threads.1
          55 kB
          Antoni
        11. impalad-drop-test-table-logs.log
          4 kB
          Antoni
        12. catalogd-invalidate-metadata-example-logs.log
          1.10 MB
          Antoni
        13. catalogd-drop-test-table.log
          359 kB
          Antoni

        Issue Links

          Activity

            People

              dtsirogiannis Dimitris Tsirogiannis
              aivanov_impala_e71b Antoni
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: