Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1169

SIGILL when aborting a replaced operation from previous leader

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Private Beta
    • 0.5.0
    • consensus, tserver

    Description

      We saw a SIGILL crash with the following stack:

      kudu::rpc::InboundCall::Respond(google::protobuf::MessageLite const&, bool) + 79 in section .text of /opt/cloudera/parcels/KUDU-0.1.0-1.kudu0.1.0.p0.195/lib/kudu/sbin-release/kudu-tserver
      (gdb) info symbol 0x959d8b
      kudu::rpc::InboundCall::RespondSuccess(google::protobuf::MessageLite const&) + 75 in section .text of /opt/cloudera/parcels/KUDU-0.1.0-1.kudu0.1.0.p0.195/lib/kudu/sbin-release/kudu-tserver
      (gdb) info symbol 0x94d24c
      kudu::rpc::RpcContext::RespondSuccess() + 524 in section .text of /opt/cloudera/parcels/KUDU-0.1.0-1.kudu0.1.0.p0.195/lib/kudu/sbin-release/kudu-tserver
      (gdb) info symbol 0x8bcb6f
      kudu::consensus::RaftConsensus::NonTxRoundReplicationFinished(kudu::consensus::ConsensusRound*, kudu::Callback<void ()(kudu::Status const&)> const&, kudu::Status const&) + 367 in section .text of /opt/cloudera/parcels/KUDU-0.1.0-1.kudu0.1.0.p0.195/lib/kudu/sbin-release/kudu-tserver
      (gdb) info symbol 0x8ca675
      kudu::consensus::ReplicaState::AbortOpsAfterUnlocked(long) + 629 in section .text of /opt/cloudera/parcels/KUDU-0.1.0-1.kudu0.1.0.p0.195/lib/kudu/sbin-release/kudu-tserver
      (gdb) info symbol 0x8b9b49
      kudu::consensus::RaftConsensus::EnforceLogMatchingPropertyMatchesUnlocked(kudu::consensus::RaftConsensus::LeaderRequest const&, kudu::consensus::ConsensusResponsePB*) + 713 in section .text of /opt/cloudera/parcels/KUDU-0.1.0-1.kudu0.1.0.p0.195/lib/kudu/sbin-release/kudu-tserver
      (gdb) info symbol 0x8c304f
      kudu::consensus::RaftConsensus::CheckLeaderRequestUnlocked(kudu::consensus::ConsensusRequestPB const*, kudu::consensus::ConsensusResponsePB*, kudu::consensus::RaftConsensus::LeaderRequest*) + 815 in section .text of /opt/cloudera/parcels/KUDU-0.1.0-1.kudu0.1.0.p0.195/lib/kudu/sbin-release/kudu-tserver
      (gdb) info symbol 0x8c4850
      kudu::consensus::RaftConsensus::UpdateReplica(kudu::consensus::ConsensusRequestPB const*, kudu::consensus::ConsensusResponsePB*) + 624 in section .text of /opt/cloudera/parcels/KUDU-0.1.0-1.kudu0.1.0.p0.195/lib/kudu/sbin-release/kudu-tserver
      (gdb) info symbol 0x8c6b61
      kudu::consensus::RaftConsensus::Update(kudu::consensus::ConsensusRequestPB const*, kudu::consensus::ConsensusResponsePB*) + 417 in section .text of /opt/cloudera/parcels/KUDU-0.1.0-1.kudu0.1.0.p0.195/lib/kudu/sbin-release/kudu-tserver
      (gdb) info symbol 0x6f9086
      kudu::tserver::ConsensusServiceImpl::UpdateConsensus(kudu::consensus::ConsensusRequestPB const*, kudu::consensus::ConsensusResponsePB*, kudu::rpc::RpcContext*) + 710 in section .text of /opt/cloudera/parcels/KUDU-0.1.0-1.kudu0.1.0.p0.195/lib/kudu/sbin-release/kudu-tserver
      

      My guess is that we somehow ended up responding twice to the same transaction

      Attachments

        Issue Links

          Activity

            People

              tlipcon Todd Lipcon
              tlipcon Todd Lipcon
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: