Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21176

Master UI hangs with spark.ui.reverseProxy=true if the master node has many CPUs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0, 2.1.1, 2.2.0, 2.2.1
    • 2.1.2, 2.2.0
    • Web UI
    • ppc64le GNU/Linux, POWER8, only master node is reachable externally other nodes are in an internal network

    Description

      In reverse proxy mode, Sparks exhausts the Jetty thread pool if the master node has too many cpus or the cluster has too many executers:

      For each ProxyServlet, Jetty creates Selector threads: minimum 4, maximum half the number of available CPUs:
      this(Math.max(1, Runtime.getRuntime().availableProcessors() / 2));
      (see https://github.com/eclipse/jetty.project/blob/0c8273f2ca1f9bf2064cd9c4c939d2546443f759/jetty-client/src/main/java/org/eclipse/jetty/client/http/HttpClientTransportOverHTTP.java)

      In reverse proxy mode, a proxy servlet is set up for each executor.
      I have a system with 7 executors and 88 CPUs on the master node. Jetty tries to instantiate 7*44 = 309 selector threads just for the reverse proxy servlets, but since the QueuedThreadPool is initialized with 200 threads by default, the UI gets stuck.

      I have patched JettyUtils.scala to extend the thread pool ( val pool = new QueuedThreadPool(400)). With this hack, the UI works.

      Obviously, the Jetty defaults are meant for a real web server. If that has 88 CPUs, you do certainly expect a lot of traffic.
      For the Spark admin UI however, there will rarely be concurrent accesses for the same application or the same executor.
      I therefore propose to dramatically reduce the number of selector threads that get instantiated - at least by default.

      I will propose a fix in a pull request.

      Attachments

        Issue Links

          Activity

            People

              IngoSchuster Ingo Schuster
              IngoSchuster Ingo Schuster
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: