Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16854

SparkClientFactory is locked too aggressively

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0
    • Fix Version/s: 3.0.0
    • Component/s: Spark
    • Labels:
      None

      Description

      Most methods in SparkClientFactory are synchronized on the SparkClientFactory singleton. However, some methods are very expensive, such as createClient(), which returns a SparkClientImpl instance. However, creating a SparkClientImpl instance requires starting a remote driver to connect back to RPCServer. This process can take a long time such as in case of a busy yarn queue. When this happens, all pending calls on SparkClientFactory will have to wait for a long time.

      In our case, hive.spark.client.server.connect.timeout is set to 1hr. This makes some queries waiting for hours before starting.

      The current implementation seems pretty much making all remote driver launches serialized. If one of them takes time, the following ones will have to wait.

      HS2 stacktrace is attached for reference. It's based on earlier version of Hive, so the line numbers might be slightly off. The following shows the locking effect:

      xuefu@hadoopservice20-sjc1:~$ grep org.apache.hive.spark.client.SparkClientFactory 15763.jstack 
      	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
      	- waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
      	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
      	- waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
      	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
      	- locked <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
      	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
      	- waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
      	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
      	- waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
      	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
      	- waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
      

        Attachments

        1. HIVE-16854.patch
          2 kB
          Xuefu Zhang
        2. HIVE-16854.2.patch
          8 kB
          Rui Li
        3. 15763.jstack
          320 kB
          Xuefu Zhang

          Issue Links

            Activity

              People

              • Assignee:
                lirui Rui Li
                Reporter:
                xuefuz Xuefu Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: