Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-468

[java client] Deadlock between removing a server and adding metadata about it

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • M4.5
    • None
    • client
    • None

    Description

      Saw this because of the current issues we're having with the halxg machines, IOPS-946.

      When a client detects that a node disconnected and, at the same time, tries to refresh the tablets that it hosts, it deadlocks because of two locks taken from both different ways.

      Stack traces:

      "New I/O worker #11":
      	at kudu.rpc.KuduClient$RemoteTablet.addTabletClient(KuduClient.java:1576)
      	- waiting to lock <0x00000006e32a5ad8> (a kudu.rpc.TabletClient)
      	at kudu.rpc.KuduClient$RemoteTablet.refreshServers(KuduClient.java:1553)
      	- locked <0x0000000523ad1690> (a java.util.ArrayList)
      	at kudu.rpc.KuduClient.discoverTablets(KuduClient.java:1038)
      ...
      
      "New I/O worker #12":
      	at kudu.rpc.KuduClient$RemoteTablet.removeTabletServer(KuduClient.java:1588)
      	- waiting to lock <0x0000000523ad1690> (a java.util.ArrayList)
      	at kudu.rpc.KuduClient.removeClientFromCache(KuduClient.java:1332)
      	at kudu.rpc.KuduClient.access$1500(KuduClient.java:110)
      	at kudu.rpc.KuduClient$TabletClientPipeline.handleDisconnect(KuduClient.java:1441)
      	- locked <0x00000006e32a5ad8> (a kudu.rpc.TabletClient)
      	at kudu.rpc.KuduClient$TabletClientPipeline.sendUpstream(KuduClient.java:1403)
      

      refreshServers() locks tabletServers then in addTabletClient() it locks a TabletClient. Meanwhile, handleDisconnect() locks that same TabletClient and then removeClientFromCache() locks tabletServers.

      Attachments

        Activity

          People

            jdcryans Jean-Daniel Cryans
            jdcryans Jean-Daniel Cryans
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: