Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.11.1, 1.13.0, 1.14.0
-
None
-
None
Description
It's possible to run into the following crash:
F0604 23:20:50.032124 35483072 delta_store.h:153] Check failed: a.delta_store_id == b.delta_store_id (4445773336 vs. 4445771896) *** Check failure stack trace: *** *** Aborted at 1622874050 (unix time) try "date -d @1622874050" if you are using GNU date *** PC: @ 0x7fff724b033a __pthread_kill *** SIGABRT (@0x7fff724b033a) received by PID 69138 (TID 0x1021d6dc0) stack trace: *** @ 0x7fff725615fd _sigtramp @ 0x7ffeef948568 (unknown) @ 0x7fff72437808 abort @ 0x107920599 google::logging_fail() @ 0x10791f4cf google::LogMessage::SendToLog() @ 0x10791fb95 google::LogMessage::Flush() @ 0x107923c9f google::LogMessageFatal::~LogMessageFatal() @ 0x107920b29 google::LogMessageFatal::~LogMessageFatal() @ 0x1009ae07e kudu::tablet::SelectedDeltas::DeltaLessThanFunctor::operator()() @ 0x1009aa561 std::__1::max<>() @ 0x10099c740 kudu::tablet::SelectedDeltas::ProcessDelta() @ 0x10099e719 kudu::tablet::SelectedDeltas::MergeFrom() @ 0x1009a2b30 kudu::tablet::DeltaPreparer<>::SelectDeltas() @ 0x10094a545 kudu::tablet::DeltaFileIterator<>::SelectDeltas() @ 0x10098b10c kudu::tablet::DeltaIteratorMerger::SelectDeltas() @ 0x10097133f kudu::tablet::DeltaApplier::InitializeSelectionVector() @ 0x1056df4fb kudu::MaterializingIterator::MaterializeBlock() @ 0x1056df2d8 kudu::MaterializingIterator::NextBlock() @ 0x1056d1c5b kudu::MergeIterState::PullNextBlock() @ 0x1056d5e62 kudu::MergeIterator::RefillHotHeap() @ 0x1056d4f0b kudu::MergeIterator::Init() @ 0x1006a413d kudu::tablet::Tablet::Iterator::Init() @ 0x1002cb3b9 kudu::tablet::DiffScanTest_TestDiffScanAfterDeltaFlush_Test::TestBody() @ 0x1005f1b88 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x1005f1add testing::Test::Run() @ 0x1005f2dd0 testing::TestInfo::Run() @ 0x1005f3807 testing::TestSuite::Run() @ 0x100601b57 testing::internal::UnitTestImpl::RunAllTests() @ 0x100601418 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x10060139c testing::UnitTest::Run() @ 0x100476201 RUN_ALL_TESTS() @ 0x100475fa8 main
The crash line assumes that all deltas for a given row that have the same timestamp belong in the same delta store, and it uses this assumption to order the deltas in a diff scan.
However, this is not true because, unlike the case for MRS flushes, we don't wait for all ops to finish applying before flushing the DMS. This means that a batch containing multiple updates to the same row may be spread across multiple DMSs if we delta flush while the batch of updates is being applied.