Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-12547

Deadlock when the task thread downloads jars using BlobClient

    XMLWordPrintableJSON

    Details

      Description

      The jstack is as follows (this jstack is from an old Flink version, but the master branch has the same problem).

      "Source: Custom Source (76/400)" #68 prio=5 os_prio=0 tid=0x00007f8139cd3000 nid=0xe2 runnable [0x00007f80da5fd000]
      java.lang.Thread.State: RUNNABLE
      at java.net.SocketInputStream.socketRead0(Native Method)
      at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
      at java.net.SocketInputStream.read(SocketInputStream.java:170)
      at java.net.SocketInputStream.read(SocketInputStream.java:141)
      at org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:152)
      at org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:140)
      at org.apache.flink.runtime.blob.BlobClient.downloadFromBlobServer(BlobClient.java:164)
      at org.apache.flink.runtime.blob.AbstractBlobCache.getFileInternal(AbstractBlobCache.java:181)
      at org.apache.flink.runtime.blob.PermanentBlobCache.getFile(PermanentBlobCache.java:206)
      at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerTask(BlobLibraryCacheManager.java:120)
      - locked <0x000000062cf2a188> (a java.lang.Object)
      at org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:968)
      at org.apache.flink.runtime.taskmanager.Task.run(Task.java:604)
      at java.lang.Thread.run(Thread.java:834)
      
      Locked ownable synchronizers:
      - None
      

       

      The reason is that SO_TIMEOUT is not set in the socket connection of the blob client. When the network packet loss seriously due to the high CPU load of the machine, the blob client connection fails to perceive that the server has been disconnected, which results in blocking in the native method. 

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sunhaibotb Haibo Sun
                Reporter:
                sunhaibotb Haibo Sun
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m