Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4765

Catalog loading threads can be wasted waiting for a large table to load

    Details

      Description

      When there are multiple requests to the catalogd to prioritize loading the same table, then several catalog loading threads may end up waiting for that single table to be loaded, effectively reducing the number of catalog loading threads. In extreme examples, this might degrade to serial loading of tables.

      Note that even a single query may issue multiple table-loading requests even for the same table if the table is very big. After issuing a load request, an impalad will wait 2m for the metadata to arrive, and then send the request again every 2m. So if a large table takes 20m to load, then a single query could issue 10 table-loading requests which ultimately hog 10 table-loading threads in the catalogd.

      The simplest way to diagnose the issue is to examine the jstack of the catalogd and then you might discover several stacks that look like this:

         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x0000000502e8c998> (a java.util.concurrent.FutureTask) <--- see if several threads are waiting on the same FutureTask
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
      	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:187)
      	at org.apache.impala.catalog.TableLoadingMgr$LoadRequest.get(TableLoadingMgr.java:72)
      	at org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:738)
      	at org.apache.impala.catalog.TableLoadingMgr.loadNextTable(TableLoadingMgr.java:288)
      	at org.apache.impala.catalog.TableLoadingMgr.access$600(TableLoadingMgr.java:50)
      	at org.apache.impala.catalog.TableLoadingMgr$3.run(TableLoadingMgr.java:259)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      

      The buggy code can be found in TableLoadingMgr.java:

        private void loadNextTable() throws InterruptedException {
          // Always get the next table from the head of the deque.
          final TTableName tblName = tableLoadingDeque_.takeFirst();
          tableLoadingSet_.remove(tblName);
          if (LOG.isTraceEnabled()) {
            LOG.trace("Loading next table. Remaining items in queue: "
                + tableLoadingDeque_.size());
          }
          try {
            // TODO: Instead of calling "getOrLoad" here we could call "loadAsync". We would
            // just need to add a mechanism for moving loaded tables into the Catalog.
            catalog_.getOrLoadTable(tblName.getDb_name(), tblName.getTable_name());
          } catch (CatalogException e) {
            // Ignore.
          }
        }
      

      Notice that the first few lines are intended to avoid loading the same table multiple times. However, the code does not prevent multiple threads from entering Catalog.getTableOrLoad() which will block on the same future for the same table.

      Reproduction:
      The issue is easy to reproduce by simulating a long table load and doing several concurrent loads of the same table from an impalad. For example, you can first "invalidate metadata t" and then "desc t" several times concurrently.

      A slow table loading can be simulated by adding a sleep inside call() function of the FutureTask created in TableLoadingMgr.loadAsync().

        Activity

        Hide
        jbapple Jim Apple added a comment -

        This is a bulk comment on all issues with Fix Version 2.8.0 that were resolved on or after 2016-12-09.

        2.8.0 was branched on December 9, with only two changes to master cherry-picked to the 2.8.0 release branch after that:

        https://github.com/apache/incubator-impala/commits/2.8.0

        Issues fixed after December 9 might not be fixed in 2.8.0. If you are the one who marked this issue Resolved, can you check to see if the patch is in 2.8.0 by using the link above? If the patch is not in 2.8.0, can you change the Fix Version to 2.9.0?

        Thank you!

        Show
        jbapple Jim Apple added a comment - This is a bulk comment on all issues with Fix Version 2.8.0 that were resolved on or after 2016-12-09. 2.8.0 was branched on December 9, with only two changes to master cherry-picked to the 2.8.0 release branch after that: https://github.com/apache/incubator-impala/commits/2.8.0 Issues fixed after December 9 might not be fixed in 2.8.0. If you are the one who marked this issue Resolved, can you check to see if the patch is in 2.8.0 by using the link above? If the patch is not in 2.8.0, can you change the Fix Version to 2.9.0? Thank you!
        Hide
        alex.behm Alexander Behm added a comment -

        Antoni, hitting this issue means you have a workload that has many concurrent requests to load the same table. By increasing the number of loading threads you can decrease the chance of running out of loading threads, but configuring the catalogd with dramatically more loading threads has far worse implications so I would not recommend that workaround. Think about it this way: increasing the number of threads might give you better catalogd utilization and loading speed when you are hitting this bug, but you are increasing the chance of overloading the catalogd and other components it talks to during normal operations (when you are not hitting this bug). Underutilization seems like the stabler bet.

        Show
        alex.behm Alexander Behm added a comment - Antoni , hitting this issue means you have a workload that has many concurrent requests to load the same table. By increasing the number of loading threads you can decrease the chance of running out of loading threads, but configuring the catalogd with dramatically more loading threads has far worse implications so I would not recommend that workaround. Think about it this way: increasing the number of threads might give you better catalogd utilization and loading speed when you are hitting this bug, but you are increasing the chance of overloading the catalogd and other components it talks to during normal operations (when you are not hitting this bug). Underutilization seems like the stabler bet.
        Hide
        aivanov_impala_e71b Antoni added a comment -

        Would increasing catalogd "num_metadata_loading_threads" prevent (or decrease the chance) this problem happening ?

        Show
        aivanov_impala_e71b Antoni added a comment - Would increasing catalogd "num_metadata_loading_threads" prevent (or decrease the chance) this problem happening ?
        Hide
        alex.behm Alexander Behm added a comment -

        commit fa4a054cde012fc0cfc74b79cbdb7008491226bb
        Author: Alex Behm <alex.behm@cloudera.com>
        Date: Thu Jan 12 17:51:51 2017 -0800

        IMPALA-4765: Avoid using several loading threads on one table.

        When there are multiple concurrent requests to the catalogd to
        prioritize loading the same table, then several catalog loading
        threads may end up waiting for that single table to be loaded,
        effectively reducing the number of catalog loading threads. In
        extreme examples, this might degrade to serial loading of tables.

        This patch augments the existing data structures and code to
        prevent using several loading threads for the same table.
        Some of the existing data structures and code could be
        consolidated/simplified but this patch does not try to address
        that issue to minimize the risk of this change.

        Testing: I could easily reproduce the bug locally with the steps
        described in the JIRA. After this patch, I could not observe threads
        being wasted anymore.

        Change-Id: Idba5f1808e0b9cbbcf46245834d8ad38d01231cb
        Reviewed-on: http://gerrit.cloudera.org:8080/5707
        Reviewed-by: Alex Behm <alex.behm@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        alex.behm Alexander Behm added a comment - commit fa4a054cde012fc0cfc74b79cbdb7008491226bb Author: Alex Behm <alex.behm@cloudera.com> Date: Thu Jan 12 17:51:51 2017 -0800 IMPALA-4765 : Avoid using several loading threads on one table. When there are multiple concurrent requests to the catalogd to prioritize loading the same table, then several catalog loading threads may end up waiting for that single table to be loaded, effectively reducing the number of catalog loading threads. In extreme examples, this might degrade to serial loading of tables. This patch augments the existing data structures and code to prevent using several loading threads for the same table. Some of the existing data structures and code could be consolidated/simplified but this patch does not try to address that issue to minimize the risk of this change. Testing: I could easily reproduce the bug locally with the steps described in the JIRA. After this patch, I could not observe threads being wasted anymore. Change-Id: Idba5f1808e0b9cbbcf46245834d8ad38d01231cb Reviewed-on: http://gerrit.cloudera.org:8080/5707 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            alex.behm Alexander Behm
            Reporter:
            alex.behm Alexander Behm
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development