Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.5.1
    • Fix Version/s: None
    • Component/s: PySpark
    • Labels:
      None

      Description

      I've been having super consistent memory leaks in the java process of python spark streaming scripts on my driver. A heap profile analysis showed MILLIONS of Finalizer objects.

      The spark web interface under Executor Thread Dump shows:
      Thread 3: Finalizer (WAITING):
      java.net.SocketInputStream.socketRead0(Native Method)
      java.net.SocketInputStream.read(SocketInputStream.java:152)
      java.net.SocketInputStream.read(SocketInputStream.java:122)
      sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
      sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
      sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
      java.io.InputStreamReader.read(InputStreamReader.java:184)
      java.io.BufferedReader.fill(BufferedReader.java:154)
      java.io.BufferedReader.readLine(BufferedReader.java:317)
      java.io.BufferedReader.readLine(BufferedReader.java:382)
      py4j.CallbackConnection.sendCommand(CallbackConnection.java:82)
      py4j.CallbackClient.sendCommand(CallbackClient.java:236)
      py4j.reflection.PythonProxyHandler.finalize(PythonProxyHandler.java:81)
      java.lang.System$2.invokeFinalize(System.java:1213)
      java.lang.ref.Finalizer.runFinalizer(Finalizer.java:98)
      java.lang.ref.Finalizer.access$100(Finalizer.java:34)
      java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:210)

      It appears the problem is with py4j. I don't have a patch because the bug is inside the python/lib/py4j-0.8.2.1-src.zip zip file. I've monkey patched and it appears to fix the problem.

      in py4j.java_gateway.CallbackConnection:1186 run():
      """
      elif command == GARBAGE_COLLECT_PROXY_COMMAND_NAME:
      self.input.readline()
      del(self.pool[obj_id])
      """
      NOTE: it doesn't write a response to the socket!

      and on the java side, CallbackConnection.java:82 sendCommand():
      """
      returnCommand = reader.readLine();
      """
      I don't know what the protocol is supposed to be, but the java side wants a response but the python side isn't sending it. As you can see from the stack trace, this jams up the java FinalizerThread which keeps anything from getting finalized, spark related or not.

      My monkey patch to py4j.java_gateway.CallbackConnection:1186 run():
      """
      elif command == GARBAGE_COLLECT_PROXY_COMMAND_NAME:
      self.input.readline()
      del(self.pool[obj_id])
      + ## PATCH: send an empty response!
      + self.socket.sendall("\n")
      + ##
      """

      This bug appears to exist in the current py4j, but I can't find the repository for the 0.8.2.1 version embedded in spark.

      I'm not entirely sure, but I suspect that (at least on the driver) this doesn't normally get triggered because the java object references held by python are long lived so it wouldn't get triggered (thus jamming up the FinalizerThread) until the program ends.

      My code is peeking at checkpoint file (code below) before starting the script, which looks like it's jamming things up at the beginning, and any other finalized objects (scala? java?) are piling up behind it.

      """
      def loadCheckpoint(checkpointPath):
      StreamingContext._ensure_initialized()
      gw = SparkContext._gateway

      1. Check whether valid checkpoint information exists in the given path
        cp = gw.jvm.CheckpointReader.read(checkpointPath)
        assert cp is not None, "Couldn't load %s" % checkpointPath
        return cp.get()
        """

      At any rate, I can confirm that the same situation exists on the worker nodes as well as the driver and this fixes both.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                david.watson@gmail.com David Watson
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: