Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3460

RPC error from VoteRequest()call to peer **:Timed out: RequestConsensusVote RPC to ** time out after 1.713s [SENT]

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 1.16.0
    • 1.16.0
    • None
    • None

    Description

      we hava 3 kudu_master and 6 kudu_tserver,when  i create 2W tables to kudu, wei got some error, and we cann't read any data from kudu,it throw many errors:

      here the errors from client :

      Job aborted due to stage failure: Task 0 in stage 35.0 failed 4 times, most recent failure: Lost task 0.3 in stage 35.0 (TID 9601) (prod-bigdata-mw-159 executor 3): java.lang.RuntimeException: org.apache.kudu.client.NonRecoverableException: tablet hasn't heard from leader or there hasn't been a stable leader fo..
      
      2023-03-08 09:59:49,198 INFO  org.apache.kudu.client.AsyncKuduClient                      [] - Invalidating location master-10.0.2.33:7051(10.0.2.33:7051) for tablet Kudu Master: Service unavailable: ListTables request on kudu.master.MasterService from 10.0.3.82:8764 dropped due to backpressure. The service queue is full; it has 100 items. 

      and i found kudu tserver has many error like :

      W0307 14:36:57.368008 14759 leader_election.cc:334] T fa2a3b405a87466da7a6b1a962f35d99 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 1640 pre-election: RPC error from VoteRequest() call topeer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 2.206s (SENT)W0307 14:36:57.368801 14759 leader_election.cc:334] T 5f8d377660aa46f29e3f1595a33d086c P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 2 pre-election: RPC error from VoteRequest() call to peer dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out: RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.725s (SENT)W0307 14:36:57.368917 14759 leader_election.cc:334] T a32af7dd8af44b47b4b26d7a222c2f6b P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 344 pre-election: RPC error from VoteRequest() call to peer dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out: RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.713s (SENT)W0307 14:36:57.369045 14759 leader_election.cc:334] T 15e9b550c3274243a5ee923ceda67dc5 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 1509 pre-election: RPC error from VoteRequest() call topeer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 3.056s (SENT)W0307 14:36:57.369563 14759 leader_election.cc:334] T e5e49b443f71478984162a2eb65d3607 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 1575 pre-election: RPC error from VoteRequest() call topeer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 1.553s (SENT)W0307 14:36:57.371872 14759 leader_election.cc:334] T 2ec17c9dd68e47ceb7f572efb9f18fe3 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 1633 pre-election: RPC error from VoteRequest() call topeer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 2.010s (SENT)W0307 14:36:57.372673 14759 leader_election.cc:334] T a91cf24cc4c943cbbd041c7e6726d7aa P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 1610 pre-election: RPC error from VoteRequest() call topeer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 1.970s (SENT)W0307 14:36:57.372789 14759 leader_election.cc:334] T cd667f33abb74afba4b9c510b8f6dfaa P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 3 pre-election: RPC error from VoteRequest() call to peer dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out: RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.674s (SENT)W0307 14:36:57.373358 14759 leader_election.cc:334] T 39709b52ffe34f81b08d0562e45a7a13 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 44 pre-election: RPC error from VoteRequest() call to peer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 1.636s (SENT)W0307 14:36:57.373525 14759 leader_election.cc:334] T 00da9e2c20814ac88e18f7d7220f01c9 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 2 pre-election: RPC error from VoteRequest() call to peer dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out: RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.524s (SENT) 

      and the disk where wal dir located is abnormal

      here is the wal file look like :

      schema_version: 0compression_codec: LZ41.1@6873507535186497536 REPLICATE NO_OP        id { term: 1 index: 1 } timestamp: 6873507535186497536 op_type: NO_OP noop_request { }COMMIT 1.1        op_type: NO_OP commited_op_id { term: 1 index: 1 }1.2@6873839930165628928 REPLICATE CHANGE_CONFIG_OP        id { term: 1 index: 2 } timestamp: 6873839930165628928 op_type: CHANGE_CONFIG_OP change_config_record { tablet_id: "68d1c87651f442189f4d6c642b6ea7e6" old_config { opid_index: -1 OBSOLETE_local: false peers { permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: "10.0.2.20" port: 7050 } } } new_config { opid_index: 2 OBSOLETE_local: false peers { permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: "10.0.2.20" port: 7050 } } peers { permanent_uuid: "d7b4384df45549a891f444d1a1f36a38" member_type: NON_VOTER last_known_addr { host: "10.0.2.19" port: 7050 } attrs { promote: true } } } }COMMIT 1.2        op_type: CHANGE_CONFIG_OP commited_op_id { term: 1 index: 2 }1.3@6873841023495979008 REPLICATE CHANGE_CONFIG_OP        id { term: 1 index: 3 } timestamp: 6873841023495979008 op_type: CHANGE_CONFIG_OP change_config_record { tablet_id: "68d1c87651f442189f4d6c642b6ea7e6" old_config { opid_index: 2 OBSOLETE_local: false peers {permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr{ host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: "10.0.2.20" port: 7050 } } peers { permanent_uuid: "d7b4384df45549a891f444d1a1f36a38" member_type: NON_VOTER last_known_addr { host: "10.0.2.19" port: 7050 } attrs { promote: true } } } new_config { opid_index: 3 OBSOLETE_local: false peers { permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: "10.0.2.20" port: 7050 } } peers { permanent_uuid: "d7b4384df45549a891f444d1a1f36a38" member_type: VOTER last_known_addr { host: "10.0.2.19" port: 7050 } attrs { promote: false } } } }COMMIT 1.3        op_type: CHANGE_CONFIG_OP commited_op_id { term: 1 index: 3 }1.4@6873841038243381248 REPLICATE CHANGE_CONFIG_OP        id { term: 1 index: 4 } timestamp: 6873841038243381248 op_type: CHANGE_CONFIG_OP change_config_record { tablet_id: "68d1c87651f442189f4d6c642b6ea7e6" old_config { opid_index: 3 OBSOLETE_local: false peers {permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr{ host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: "10.0.2.20" port: 7050 } } peers { permanent_uuid: "d7b4384df45549a891f444d1a1f36a38" member_type: VOTER last_known_addr { host: "10.0.2.19" port: 7050 } attrs { promote: false } } } new_config { opid_index: 4 OBSOLETE_local: false peers { permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "d7b4384df45549a891f444d1a1f36a38" member_type: VOTER last_known_addr { host: "10.0.2.19" port: 7050 } attrs { promote: false } } } }COMMIT 1.4 

      and there are many raft worker theads running,

      it seems like system is busy to handle consensus vote, and i didn't got more helpful error logs in kudu, can anyone explain what happened?

       

      Attachments

        1. image-2023-03-17-15-38-51-218.png
          406 kB
          daicheng
        2. image-2023-03-17-15-28-40-361.png
          80 kB
          daicheng
        3. image-2023-03-17-15-28-13-480.png
          88 kB
          daicheng
        4. image-2023-03-17-15-27-45-755.png
          52 kB
          daicheng

        Activity

          People

            Unassigned Unassigned
            dachn daicheng
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: