Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8606

GET_TABLES performance in local catalog mode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 3.2.0
    • Impala 3.3.0
    • Catalog
    • ghx-label-8

    Description

      With local catalog mode enabled, GET_TABLES JDBC requests will return more than the always available table information. Any request for more metadata about a table will trigger a full load of that table on the catalogd side, meaning that GET_TABLES triggers the load of the entire catalog. Also, as far as I can see, the requests for more metadata are made one table at a time.

      Once the tables are loaded on the catalogd-side, a coordinator needs 3 roundtrips to the catalog to fetch all the details about a single table. My test case had around 57k tables, 1700 DBs, and ~120k partitions.
      GET_TABLES on a cold catalog takes 18 minutes. With a warm catalog, but cold impalad, it still takes ~70 seconds.

      Many tools use GET_TABLES to populate dropdowns, etc. so this is bad for both end user experience and catalog memory usage.

      Attachments

        Issue Links

          Activity

            People

              stigahuang Quanlong Huang
              jeszyb Balazs Jeszenszky
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: