Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3460

RPC error from VoteRequest()call to peer **:Timed out: RequestConsensusVote RPC to ** time out after 1.713s [SENT]

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 1.16.0
    • 1.16.0
    • None
    • None

    Description

      we hava 3 kudu_master and 6 kudu_tserver,when  i create 2W tables to kudu, wei got some error, and we cann't read any data from kudu,it throw many errors:

      here the errors from client :

      Job aborted due to stage failure: Task 0 in stage 35.0 failed 4 times, most recent failure: Lost task 0.3 in stage 35.0 (TID 9601) (prod-bigdata-mw-159 executor 3): java.lang.RuntimeException: org.apache.kudu.client.NonRecoverableException: tablet hasn't heard from leader or there hasn't been a stable leader fo..
      
      2023-03-08 09:59:49,198 INFO  org.apache.kudu.client.AsyncKuduClient                      [] - Invalidating location master-10.0.2.33:7051(10.0.2.33:7051) for tablet Kudu Master: Service unavailable: ListTables request on kudu.master.MasterService from 10.0.3.82:8764 dropped due to backpressure. The service queue is full; it has 100 items. 

      and i found kudu tserver has many error like :

      W0307 14:36:57.368008 14759 leader_election.cc:334] T fa2a3b405a87466da7a6b1a962f35d99 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 1640 pre-election: RPC error from VoteRequest() call topeer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 2.206s (SENT)W0307 14:36:57.368801 14759 leader_election.cc:334] T 5f8d377660aa46f29e3f1595a33d086c P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 2 pre-election: RPC error from VoteRequest() call to peer dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out: RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.725s (SENT)W0307 14:36:57.368917 14759 leader_election.cc:334] T a32af7dd8af44b47b4b26d7a222c2f6b P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 344 pre-election: RPC error from VoteRequest() call to peer dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out: RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.713s (SENT)W0307 14:36:57.369045 14759 leader_election.cc:334] T 15e9b550c3274243a5ee923ceda67dc5 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 1509 pre-election: RPC error from VoteRequest() call topeer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 3.056s (SENT)W0307 14:36:57.369563 14759 leader_election.cc:334] T e5e49b443f71478984162a2eb65d3607 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 1575 pre-election: RPC error from VoteRequest() call topeer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 1.553s (SENT)W0307 14:36:57.371872 14759 leader_election.cc:334] T 2ec17c9dd68e47ceb7f572efb9f18fe3 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 1633 pre-election: RPC error from VoteRequest() call topeer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 2.010s (SENT)W0307 14:36:57.372673 14759 leader_election.cc:334] T a91cf24cc4c943cbbd041c7e6726d7aa P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 1610 pre-election: RPC error from VoteRequest() call topeer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 1.970s (SENT)W0307 14:36:57.372789 14759 leader_election.cc:334] T cd667f33abb74afba4b9c510b8f6dfaa P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 3 pre-election: RPC error from VoteRequest() call to peer dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out: RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.674s (SENT)W0307 14:36:57.373358 14759 leader_election.cc:334] T 39709b52ffe34f81b08d0562e45a7a13 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 44 pre-election: RPC error from VoteRequest() call to peer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 1.636s (SENT)W0307 14:36:57.373525 14759 leader_election.cc:334] T 00da9e2c20814ac88e18f7d7220f01c9 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 2 pre-election: RPC error from VoteRequest() call to peer dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out: RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.524s (SENT) 

      and the disk where wal dir located is abnormal

      here is the wal file look like :

      schema_version: 0compression_codec: LZ41.1@6873507535186497536 REPLICATE NO_OP        id { term: 1 index: 1 } timestamp: 6873507535186497536 op_type: NO_OP noop_request { }COMMIT 1.1        op_type: NO_OP commited_op_id { term: 1 index: 1 }1.2@6873839930165628928 REPLICATE CHANGE_CONFIG_OP        id { term: 1 index: 2 } timestamp: 6873839930165628928 op_type: CHANGE_CONFIG_OP change_config_record { tablet_id: "68d1c87651f442189f4d6c642b6ea7e6" old_config { opid_index: -1 OBSOLETE_local: false peers { permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: "10.0.2.20" port: 7050 } } } new_config { opid_index: 2 OBSOLETE_local: false peers { permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: "10.0.2.20" port: 7050 } } peers { permanent_uuid: "d7b4384df45549a891f444d1a1f36a38" member_type: NON_VOTER last_known_addr { host: "10.0.2.19" port: 7050 } attrs { promote: true } } } }COMMIT 1.2        op_type: CHANGE_CONFIG_OP commited_op_id { term: 1 index: 2 }1.3@6873841023495979008 REPLICATE CHANGE_CONFIG_OP        id { term: 1 index: 3 } timestamp: 6873841023495979008 op_type: CHANGE_CONFIG_OP change_config_record { tablet_id: "68d1c87651f442189f4d6c642b6ea7e6" old_config { opid_index: 2 OBSOLETE_local: false peers {permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr{ host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: "10.0.2.20" port: 7050 } } peers { permanent_uuid: "d7b4384df45549a891f444d1a1f36a38" member_type: NON_VOTER last_known_addr { host: "10.0.2.19" port: 7050 } attrs { promote: true } } } new_config { opid_index: 3 OBSOLETE_local: false peers { permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: "10.0.2.20" port: 7050 } } peers { permanent_uuid: "d7b4384df45549a891f444d1a1f36a38" member_type: VOTER last_known_addr { host: "10.0.2.19" port: 7050 } attrs { promote: false } } } }COMMIT 1.3        op_type: CHANGE_CONFIG_OP commited_op_id { term: 1 index: 3 }1.4@6873841038243381248 REPLICATE CHANGE_CONFIG_OP        id { term: 1 index: 4 } timestamp: 6873841038243381248 op_type: CHANGE_CONFIG_OP change_config_record { tablet_id: "68d1c87651f442189f4d6c642b6ea7e6" old_config { opid_index: 3 OBSOLETE_local: false peers {permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr{ host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: "10.0.2.20" port: 7050 } } peers { permanent_uuid: "d7b4384df45549a891f444d1a1f36a38" member_type: VOTER last_known_addr { host: "10.0.2.19" port: 7050 } attrs { promote: false } } } new_config { opid_index: 4 OBSOLETE_local: false peers { permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "d7b4384df45549a891f444d1a1f36a38" member_type: VOTER last_known_addr { host: "10.0.2.19" port: 7050 } attrs { promote: false } } } }COMMIT 1.4 

      and there are many raft worker theads running,

      it seems like system is busy to handle consensus vote, and i didn't got more helpful error logs in kudu, can anyone explain what happened?

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            dachn daicheng
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment