Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3291

Crash when performing a diff scan after delta flush races with a batch of ops that update the same row

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.11.1, 1.13.0, 1.14.0
    • 1.15.0
    • None
    • None

    Description

      It's possible to run into the following crash:

      F0604 23:20:50.032124 35483072 delta_store.h:153] Check failed: a.delta_store_id == b.delta_store_id (4445773336 vs. 4445771896)
      *** Check failure stack trace: ***
      *** Aborted at 1622874050 (unix time) try "date -d @1622874050" if you are using GNU date ***
      PC: @     0x7fff724b033a __pthread_kill
      *** SIGABRT (@0x7fff724b033a) received by PID 69138 (TID 0x1021d6dc0) stack trace: ***
          @     0x7fff725615fd _sigtramp
          @     0x7ffeef948568 (unknown)
          @     0x7fff72437808 abort
          @        0x107920599 google::logging_fail()
          @        0x10791f4cf google::LogMessage::SendToLog()
          @        0x10791fb95 google::LogMessage::Flush()
          @        0x107923c9f google::LogMessageFatal::~LogMessageFatal()
          @        0x107920b29 google::LogMessageFatal::~LogMessageFatal()
          @        0x1009ae07e kudu::tablet::SelectedDeltas::DeltaLessThanFunctor::operator()()
          @        0x1009aa561 std::__1::max<>()
          @        0x10099c740 kudu::tablet::SelectedDeltas::ProcessDelta()
          @        0x10099e719 kudu::tablet::SelectedDeltas::MergeFrom()
          @        0x1009a2b30 kudu::tablet::DeltaPreparer<>::SelectDeltas()
          @        0x10094a545 kudu::tablet::DeltaFileIterator<>::SelectDeltas()
          @        0x10098b10c kudu::tablet::DeltaIteratorMerger::SelectDeltas()
          @        0x10097133f kudu::tablet::DeltaApplier::InitializeSelectionVector()
          @        0x1056df4fb kudu::MaterializingIterator::MaterializeBlock()
          @        0x1056df2d8 kudu::MaterializingIterator::NextBlock()
          @        0x1056d1c5b kudu::MergeIterState::PullNextBlock()
          @        0x1056d5e62 kudu::MergeIterator::RefillHotHeap()
          @        0x1056d4f0b kudu::MergeIterator::Init()
          @        0x1006a413d kudu::tablet::Tablet::Iterator::Init()
          @        0x1002cb3b9 kudu::tablet::DiffScanTest_TestDiffScanAfterDeltaFlush_Test::TestBody()
          @        0x1005f1b88 testing::internal::HandleExceptionsInMethodIfSupported<>()
          @        0x1005f1add testing::Test::Run()
          @        0x1005f2dd0 testing::TestInfo::Run()
          @        0x1005f3807 testing::TestSuite::Run()
          @        0x100601b57 testing::internal::UnitTestImpl::RunAllTests()
          @        0x100601418 testing::internal::HandleExceptionsInMethodIfSupported<>()
          @        0x10060139c testing::UnitTest::Run()
          @        0x100476201 RUN_ALL_TESTS()
          @        0x100475fa8 main
      

      The crash line assumes that all deltas for a given row that have the same timestamp belong in the same delta store, and it uses this assumption to order the deltas in a diff scan.

      However, this is not true because, unlike the case for MRS flushes, we don't wait for all ops to finish applying before flushing the DMS. This means that a batch containing multiple updates to the same row may be spread across multiple DMSs if we delta flush while the batch of updates is being applied.

      Attachments

        Activity

          People

            awong Andrew Wong
            awong Andrew Wong
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: