Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5729

Kudu may crash in minicluster if clock becomes unsynchronized

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: Impala 2.10.0
    • Fix Version/s: None
    • Component/s: Infrastructure
    • Labels:
    • Epic Color:
      ghx-label-8

      Description

      See e.g. https://jenkins.impala.io/job/gerrit-verify-dryrun/937/consoleFull.

      00:44:28 ] E   HiveServer2Error: AnalysisException: Error opening Kudu table 'impala::tpch_kudu.lineitem', Kudu error: can not complete before timeout: KuduRpc(method=GetTableSchema, tablet=null, attempt=94, DeadlineTracker(timeout=180000, elapsed=179403), Traces: [0ms] querying master, [0ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [0ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [1ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [22ms] querying master, [22ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [22ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [23ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [42ms] querying master, [42ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [42ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [43ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [62ms] querying master, [63ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [63ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [63ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [82ms] querying master, [82ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [82ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [83ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [102ms] querying master, [102ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [103ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [103ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [162ms] querying master, [162ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [162ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [163ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [242ms] querying master, [242ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [242ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [243ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [362ms] querying master, [362ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [362ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [363ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [763ms] querying master, [763ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [763ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [764ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [962ms] querying master, [962ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [963ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [963ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [2862ms] querying master, [2863ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [2863ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [2864ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [5842ms] querying master, [5843ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [5843ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [5844ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [9102ms] querying master, [9103ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [9103ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [9104ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [9342ms] querying master, [9343ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [9343ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [9344ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [13142ms] querying master, [13142ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [13142ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [13143ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [16082ms] querying master, [16083ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [16083ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [16084ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [19702ms] querying master, [19702ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [19703ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [19703ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [21282ms] querying master, [21282ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [21282ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [21283ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [22702ms] querying master, [22702ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [22702ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [22703ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [24702ms] querying master, [24702ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [24703ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [24703ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [24822ms] querying master, [24822ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [24823ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [24824ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [27362ms] querying master, [27362ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [27363ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [27363ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [28862ms] querying master, [28863ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [28863ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [28864ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [30023ms] querying master, [30023ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [30023ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] connection closed, [30024ms] delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] connection closed, [30402ms] trace too long, truncated)
      

      In another example, one tablet server's logs say:

      W0726 21:13:15.084012 21047 heartbeater.cc:499] Failed to heartbeat to 127.0.0.1:7051: Network error: Failed to ping master at 127.0.0.1:7051: Client connection negotiation failed: client connection to 127.0.0.1:7051: connect: Connection refused (error 111)
      W0726 21:13:15.084311 21047 heartbeater.cc:499] Failed to heartbeat to 127.0.0.1:7051: Network error: Failed to ping master at 127.0.0.1:7051: Client connection negotiation failed: client connection to 127.0.0.1:7051: connect: Connection refused (error 111)
      W0726 21:13:15.084451 21047 heartbeater.cc:499] Failed to heartbeat to 127.0.0.1:7051: Network error: Failed to ping master at 127.0.0.1:7051: Client connection negotiation failed: client connection to 127.0.0.1:7051: connect: Connection refused (error 111)
      W0726 21:13:15.084461 21047 heartbeater.cc:326] Failed 3 heartbeats in a row: no longer allowing fast heartbeat attempts.
      W0726 22:32:04.053184 115410 log.cc:665] Time spent T b1dc3548d9994642b0237fa267835d7d P ea1677566c84417781c6470977bf5ab0: Append to log took a long time: real 0.078s  user 0.026s     sys 0.050s
      W0726 22:32:08.173146 115415 log.cc:665] Time spent T d244f5c3ae53407688176aecdba8fc97 P ea1677566c84417781c6470977bf5ab0: Append to log took a long time: real 0.067s  user 0.028s     sys 0.037s
      W0726 22:32:09.778363 115410 log.cc:665] Time spent T b1dc3548d9994642b0237fa267835d7d P ea1677566c84417781c6470977bf5ab0: Append to log took a long time: real 0.101s  user 0.024s     sys 0.073s
      W0726 22:32:12.515143 115410 log.cc:665] Time spent T b1dc3548d9994642b0237fa267835d7d P ea1677566c84417781c6470977bf5ab0: Append to log took a long time: real 0.083s  user 0.032s     sys 0.047s
      W0726 22:32:20.965201 115615 log.cc:665] Time spent T e601566f7a3a46c4aff9d4e5a3681929 P ea1677566c84417781c6470977bf5ab0: Append to log took a long time: real 0.066s  user 0.025s     sys 0.039s
      W0726 22:32:21.965332 115561 log.cc:665] Time spent T 8f219cda1bf7401c9b58884b4f7f7d5f P ea1677566c84417781c6470977bf5ab0: Append to log took a long time: real 0.148s  user 0.023s     sys 0.122s
      W0726 22:32:21.974016 115615 log.cc:665] Time spent T e601566f7a3a46c4aff9d4e5a3681929 P ea1677566c84417781c6470977bf5ab0: Append to log took a long time: real 0.072s  user 0.026s     sys 0.042s
      W0726 22:32:23.912673 115561 log.cc:665] Time spent T 8f219cda1bf7401c9b58884b4f7f7d5f P ea1677566c84417781c6470977bf5ab0: Append to log took a long time: real 0.088s  user 0.027s     sys 0.060s
      W0726 22:32:26.854365 115615 log.cc:665] Time spent T e601566f7a3a46c4aff9d4e5a3681929 P ea1677566c84417781c6470977bf5ab0: Append to log took a long time: real 0.111s  user 0.029s     sys 0.009s
      W0726 22:33:16.664842 116458 log.cc:665] Time spent T 7c8ce06d9df9491fbae6844a44c2e8a4 P ea1677566c84417781c6470977bf5ab0: Append to log took a long time: real 0.181s  user 0.019s     sys 0.004s
      W0726 23:35:14.737128 20820 thread.cc:506] raft [worker] (thread pool) Time spent starting thread: real 0.977s  user 0.000s     sys 0.000s
      W0726 23:35:15.670353 20820 thread.cc:512] raft [worker] (thread pool) Time spent creating pthread: real 0.852s user 0.000s     sys 0.000s
      W0726 23:35:15.670446 20829 connection.cc:625] client connection to 127.0.0.1:31201 send error: Network error: sendmsg error: Connection reset by peer (error 104)
      W0726 23:35:15.670465 20829 consensus_peers.cc:378] T cbaaf0212ddb48e297ff5668a233d97b P ea1677566c84417781c6470977bf5ab0 -> Peer 7f763fa32d9a4f828eba4361d90c3830 (ip-172-31-13-116:31201): Couldn't send request to peer 7f763fa32d9a4f828eba4361d90c3830 for tablet cbaaf0212ddb48e297ff5668a233d97b. Status: Network error: sendmsg error: Connection reset by peer (error 104). Retrying in the next heartbeat period. Already tried 1 times.
      W0726 23:35:15.670476 20829 consensus_peers.cc:378] T 594f54d9fed34d23aaef264b462df5e9 P ea1677566c84417781c6470977bf5ab0 -> Peer 7f763fa32d9a4f828eba4361d90c3830 (ip-172-31-13-116:31201): Couldn't send request to peer 7f763fa32d9a4f828eba4361d90c3830 for tablet 594f54d9fed34d23aaef264b462df5e9. Status: Network error: sendmsg error: Connection reset by peer (error 104). Retrying in the next heartbeat period. Already tried 1 times.
      W0726 23:35:15.670486 20829 consensus_peers.cc:378] T 95cf23a6ac0d47ce90c1ef84defa64ea P ea1677566c84417781c6470977bf5ab0 -> Peer 7f763fa32d9a4f828eba4361d90c3830 (ip-172-31-13-116:31201): Couldn't send request to peer 7f763fa32d9a4f828eba4361d90c3830 for tablet 95cf23a6ac0d47ce90c1ef84defa64ea. Status: Network error: sendmsg error: Connection reset by peer (error 104). Retrying in the next heartbeat period. Already tried 1 times.
      W0726 23:35:15.670704 20820 thread.cc:506] raft [worker] (thread pool) Time spent starting thread: real 0.852s  user 0.000s     sys 0.000s
      W0726 23:35:15.847403 20829 consensus_peers.cc:378] T 095cab3197494b54bf4498ca10df8087 P ea1677566c84417781c6470977bf5ab0 -> Peer 7f763fa32d9a4f828eba4361d90c3830 (ip-172-31-13-116:31201): Couldn't send request to peer 7f763fa32d9a4f828eba4361d90c3830 for tablet 095cab3197494b54bf4498ca10df8087. Status: Network error: Client connection negotiation failed: client connection to 127.0.0.1:31201: connect: Connection refused (error 111). Retrying in the next heartbeat period. Already tried 1 times
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                henryr Henry Robinson
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: