Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2900

Master crash reported in disk_failure-itest

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: n/a
    • Component/s: None
    • Labels:
      None

      Description

      When getting table locations immediately following a tablet copy (though this may be a red herring), the master hit a DCHECK when running GetTableLocations().

       

      I0720 00:42:48.285115 234 cluster_verifier.cc:82] Check not successful yet, sleeping and retrying: Runtime error: ksck discovered errors
      I0720 00:42:48.667629 2277 raft_consensus.cc:1184] T f8dcedcbd6bb47ce9de0a37cc84ebca5 P d9d5e137df31450caa9dc831972c1f5e [term 2 FOLLOWER]: Refusing update from remote peer 526eadb7abb3438790a38e0d9973a6a5: Log matching property violated. Preceding OpId in replica: term: 1 index: 1. Preceding OpId from leader: term: 2 index: 2. (index mismatch)
      I0720 00:42:48.668309 2961 consensus_queue.cc:984] T f8dcedcbd6bb47ce9de0a37cc84ebca5 P 526eadb7abb3438790a38e0d9973a6a5 [LEADER]: Connected to new peer: Peer: permanent_uuid: "d9d5e137df31450caa9dc831972c1f5e" member_type: VOTER last_known_addr { host: "127.0.58.129" port: 37035 }, Status: LMP_MISMATCH, Last received: 0.0, Next index: 2, Last known committed idx: 1, Time since last communication: 0.000s
      W0720 00:42:48.673665 2471 consensus_peers.cc:458] T f8dcedcbd6bb47ce9de0a37cc84ebca5 P 526eadb7abb3438790a38e0d9973a6a5 -> Peer bc957945380a4e119d7ac829148eede6 (127.0.58.130:37359): Couldn't send request to peer bc957945380a4e119d7ac829148eede6. Error code: TABLET_FAILED (20). Status: Illegal state: Tablet not RUNNING: FAILED: IO error: some tablet data is in a failed directory. This is attempt 1: this message will repeat every 5th retry.
      I0720 00:42:48.674023 2961 raft_consensus.cc:922] T f8dcedcbd6bb47ce9de0a37cc84ebca5 P 526eadb7abb3438790a38e0d9973a6a5: Attempting to remove follower bc957945380a4e119d7ac829148eede6 from the Raft config. Reason: The tablet replica hosted on peer bc957945380a4e119d7ac829148eede6 has failed
      I0720 00:42:48.674721 2961 consensus_queue.cc:206] T f8dcedcbd6bb47ce9de0a37cc84ebca5 P 526eadb7abb3438790a38e0d9973a6a5 [LEADER]: Queue going to LEADER mode. State: All replicated index: 0, Majority replicated index: 2, Committed index: 2, Last appended: 2.2, Last appended by leader: 1, Current term: 2, Majority size: 2, State: 0, Mode: LEADER, active raft config: opid_index: 3 OBSOLETE_local: false peers { permanent_uuid: "526eadb7abb3438790a38e0d9973a6a5" member_type: VOTER last_known_addr { host: "127.0.58.131" port: 33941 } } peers { permanent_uuid: "d9d5e137df31450caa9dc831972c1f5e" member_type: VOTER last_known_addr { host: "127.0.58.129" port: 37035 } }
      I0720 00:42:48.675925 2277 raft_consensus.cc:1184] T f8dcedcbd6bb47ce9de0a37cc84ebca5 P d9d5e137df31450caa9dc831972c1f5e [term 2 FOLLOWER]: Refusing update from remote peer 526eadb7abb3438790a38e0d9973a6a5: Log matching property violated. Preceding OpId in replica: term: 2 index: 2. Preceding OpId from leader: term: 2 index: 3. (index mismatch)
      I0720 00:42:48.676373 2965 consensus_queue.cc:984] T f8dcedcbd6bb47ce9de0a37cc84ebca5 P 526eadb7abb3438790a38e0d9973a6a5 [LEADER]: Connected to new peer: Peer: permanent_uuid: "d9d5e137df31450caa9dc831972c1f5e" member_type: VOTER last_known_addr { host: "127.0.58.129" port: 37035 }, Status: LMP_MISMATCH, Last received: 0.0, Next index: 3, Last known committed idx: 2, Time since last communication: 0.000s
      I0720 00:42:48.678308 2961 raft_consensus.cc:2792] T f8dcedcbd6bb47ce9de0a37cc84ebca5 P 526eadb7abb3438790a38e0d9973a6a5 [term 2 LEADER]: Committing config change with OpId 2.3: config changed from index -1 to 3, VOTER bc957945380a4e119d7ac829148eede6 (127.0.58.130) evicted. New config: { opid_index: 3 OBSOLETE_local: false peers { permanent_uuid: "526eadb7abb3438790a38e0d9973a6a5" member_type: VOTER last_known_addr { host: "127.0.58.131" port: 33941 } } peers { permanent_uuid: "d9d5e137df31450caa9dc831972c1f5e" member_type: VOTER last_known_addr { host: "127.0.58.129" port: 37035 } } }
      I0720 00:42:48.678822 2277 raft_consensus.cc:2792] T f8dcedcbd6bb47ce9de0a37cc84ebca5 P d9d5e137df31450caa9dc831972c1f5e [term 2 FOLLOWER]: Committing config change with OpId 2.3: config changed from index -1 to 3, VOTER bc957945380a4e119d7ac829148eede6 (127.0.58.130) evicted. New config: { opid_index: 3 OBSOLETE_local: false peers { permanent_uuid: "526eadb7abb3438790a38e0d9973a6a5" member_type: VOTER last_known_addr { host: "127.0.58.131" port: 33941 } } peers { permanent_uuid: "d9d5e137df31450caa9dc831972c1f5e" member_type: VOTER last_known_addr { host: "127.0.58.129" port: 37035 } } }
      I0720 00:42:48.787096 2969 ts_tablet_manager.cc:682] T ade4b6ce17504569a6d13d9018288194 P bc957945380a4e119d7ac829148eede6: Initiating tablet copy from peer d9d5e137df31450caa9dc831972c1f5e (127.0.58.129:37035)
      I0720 00:42:48.787333 2969 tablet_copy_client.cc:204] T ade4b6ce17504569a6d13d9018288194 P bc957945380a4e119d7ac829148eede6: tablet copy: overwriting existing tombstoned replica with an unknown last-logged opid
      I0720 00:42:48.787505 2969 tablet_copy_client.cc:241] T ade4b6ce17504569a6d13d9018288194 P bc957945380a4e119d7ac829148eede6: tablet copy: Beginning tablet copy session from remote peer at address 127.0.58.129:37035
      I0720 00:42:48.793766 2297 tablet_copy_service.cc:135] P d9d5e137df31450caa9dc831972c1f5e: Received BeginTabletCopySession request for tablet ade4b6ce17504569a6d13d9018288194 from peer bc957945380a4e119d7ac829148eede6 ({username='slave'} at 127.0.58.130:35599)
      I0720 00:42:48.793942 2297 tablet_copy_service.cc:156] P d9d5e137df31450caa9dc831972c1f5e: Beginning new tablet copy session on tablet ade4b6ce17504569a6d13d9018288194 from peer bc957945380a4e119d7ac829148eede6 at {username='slave'} at 127.0.58.130:35599: session id = bc957945380a4e119d7ac829148eede6-ade4b6ce17504569a6d13d9018288194
      F0720 00:42:48.836412 2162 quorum_util.cc:167] Check failed: RaftPeerPB::NON_PARTICIPANT != GetConsensusRole(peer_uuid, cstate) (3 vs. 3) Peer bc957945380a4e119d7ac829148eede6 << not a participant in current_term: 1 leader_uuid: "d9d5e137df31450caa9dc831972c1f5e" committed_config { opid_index: 3 OBSOLETE_local: false peers { permanent_uuid: "526eadb7abb3438790a38e0d9973a6a5" member_type: VOTER last_known_addr { host: "127.0.58.131" port: 33941 } } peers { permanent_uuid: "d9d5e137df31450caa9dc831972c1f5e" member_type: VOTER last_known_addr { host: "127.0.58.129" port: 37035 } } peers { permanent_uuid: "bc957945380a4e119d7ac829148eede6" member_type: VOTER last_known_addr { host: "127.0.58.130" port: 37359 } } } pending_config { opid_index: 4 OBSOLETE_local: false peers { permanent_uuid: "526eadb7abb3438790a38e0d9973a6a5" member_type: VOTER last_known_addr { host: "127.0.58.131" port: 33941 } } peers { permanent_uuid: "d9d5e137df31450caa9dc831972c1f5e" member_type: VOTER last_known_addr { host: "127.0.58.129" port: 37035 } } }
      *** Check failure stack trace: ***
      @ 0x7f527aaea62d google::LogMessage::Fail() at ??:0
      @ 0x7f527aaec64c google::LogMessage::SendToLog() at ??:0
      @ 0x7f527aaea189 google::LogMessage::Flush() at ??:0
      @ 0x7f527aaecfdf google::LogMessageFatal::~LogMessageFatal() at ??:0
      @ 0x7f52828cbc3c kudu::consensus::GetParticipantRole() at ??:0
      @ 0x7f528d3f4e93 kudu::master::CatalogManager::BuildLocationsForTablet() at ??:0
      @ 0x7f528d3f9e27 kudu::master::CatalogManager::GetTableLocations() at ??:0
      @ 0x7f528d543fe6 kudu::master::MasterServiceImpl::GetTableLocations() at ??:0
      @ 0x7f5288b49caa std::_Function_handler<>::_M_invoke() at ??:0
      @ 0x7f527f34431c std::function<>::operator()() at ??:0
      @ 0x7f527f342e8b kudu::rpc::GeneratedServiceIf::Handle() at ??:0
      @ 0x7f527f346988 kudu::rpc::ServicePool::RunThread() at ??:0
      @ 0x7f527f34beb3 boost::_bi::bind_t<>::operator()() at ??:0
      @ 0x7f527f2a680c boost::function0<>::operator()() at ??:0
      @ 0x7f527bcf1e0b kudu::Thread::SuperviseThread() at ??:0
      @ 0x7f52841a7184 start_thread at ??:0
      @ 0x7f5277703ffd clone at ??:0

        Attachments

        1. disk_failure-itest.txt
          3.64 MB
          Andrew Wong

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                awong Andrew Wong
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: