Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4522

Impalad crash under stress due to too many Kudu client threads

    Details

      Description

      In stress testing on physical boxes (80 cores, 200gb ram) we discovered that the Kudu Java client creates a huge number of threads (2x the #cores) per Kudu client. Impala creates a Kudu client in catalog (table loading and DDL), during planning (fetching tablet locations for scans), and in the BE. The client created in planning is particularly problematic wrt the number of threads created, especially on machines with a large number of cores and under load. In the stress tests, this could result in the process crashing at some point when the JVM could not create more threads.

      The attached hs_err.log shows this crash.

      While we should explore sharing a single Kudu client (1 client needed per master), in the meantime we should reduce the number of threads the client creates, which is exposed in the Kudu API.

      1. hs_err.log
        3.47 MB
        Matthew Jacobs

        Activity

        Hide
        mjacobs Matthew Jacobs added a comment -

        commit 1fea9973d2cd4fd61d9377ef9ce4f5accafb41b0
        Author: Matthew Jacobs <mj@cloudera.com>
        Date: Wed Nov 23 11:39:25 2016 -0800

        IMPALA-4522: Bound Kudu client threads to avoid stress crash

        In stress testing on physical boxes (80 cores, 200gb ram) we
        discovered that the Kudu Java client creates a huge number
        of threads (2x the #cores) per Kudu client, and this was
        causing the impalad to crash when the JVM couldn't create
        more threads.

        This addresses the issue by setting the number of Kudu
        client worker threads rather than letting the Kudu client
        pick the default (2 * #cores). The number set here was
        suggested by the Kudu team as being sufficient for Impala's
        FE usage and this has been tested for 8+ hours on the stress
        cluster where the crash was previously observed quickly.

        In the future, Impala should probably be sharing a single
        Kudu client (it is multithreaded), but additional support
        from Kudu may be needed to ensure this usage is correct
        (e.g. client metadata may need invalidation after some
        operations).

        Change-Id: I3940df776eaa5ad22e1bbb572559afcc8990bf1d
        Reviewed-on: http://gerrit.cloudera.org:8080/5205
        Reviewed-by: Alex Behm <alex.behm@cloudera.com>
        Tested-by: Internal Jenkins

        Show
        mjacobs Matthew Jacobs added a comment - commit 1fea9973d2cd4fd61d9377ef9ce4f5accafb41b0 Author: Matthew Jacobs <mj@cloudera.com> Date: Wed Nov 23 11:39:25 2016 -0800 IMPALA-4522 : Bound Kudu client threads to avoid stress crash In stress testing on physical boxes (80 cores, 200gb ram) we discovered that the Kudu Java client creates a huge number of threads (2x the #cores) per Kudu client, and this was causing the impalad to crash when the JVM couldn't create more threads. This addresses the issue by setting the number of Kudu client worker threads rather than letting the Kudu client pick the default (2 * #cores). The number set here was suggested by the Kudu team as being sufficient for Impala's FE usage and this has been tested for 8+ hours on the stress cluster where the crash was previously observed quickly. In the future, Impala should probably be sharing a single Kudu client (it is multithreaded), but additional support from Kudu may be needed to ensure this usage is correct (e.g. client metadata may need invalidation after some operations). Change-Id: I3940df776eaa5ad22e1bbb572559afcc8990bf1d Reviewed-on: http://gerrit.cloudera.org:8080/5205 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins

          People

          • Assignee:
            mjacobs Matthew Jacobs
            Reporter:
            mjacobs Matthew Jacobs
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development