Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3570

Use-after-free and data race in MajorDeltaCompactionOp when AlterTablet is running

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.18.0, 1.17.1
    • tserver
    • None

    Description

      Running alter_table-randomized-test under TSAN produced heap-use-after-free and data race warnings like below, indicating corresponding conditions might hit when a major delta compaction (MajorDeltaCompactionOp) maintenance operation is run when a table being altered.

      In addition to TSAN warnings, running the alter_table-randomized-test for DEBUG/ASAN/TSAN builds would crash with SIGABRT and fatal messages like below due to a triggered DCHECK constraint. In RELEASE build that condition might lead to an unexpected behavior (e.g., silent data corruption, crash, etc.) when Schema::num_columns() or other methods called on a corrupted Schema object.

      DCHECK triggers a crash with SIGABRT with funny size numbers:

      F20240426 14:25:15.006683 245509 schema.h:584] Check failed: cols_.size() == name_to_index_.size() (5270498306772959232 vs. 643461730718517486) 
      *** Check failure stack trace: ***
          @     0x7f2006677390  google::LogMessage::Flush()
          @     0x7f200667c4cb  google::LogMessageFatal::~LogMessageFatal()
          @           0x4eefff  kudu::Schema::num_columns()
          @     0x7f200dd18529  kudu::tablet::DeltaPreparer<>::Start()
          @     0x7f200dcde94f  kudu::tablet::DeltaFileIterator<>::PrepareBatch()
          @     0x7f200dd07d81  kudu::tablet::DeltaIteratorMerger::PrepareBatch()
          @     0x7f200dd012b1  kudu::tablet::MajorDeltaCompaction::FlushRowSetAndDeltas()
          @     0x7f200dd02fa1  kudu::tablet::MajorDeltaCompaction::Compact()
          @     0x7f200dc1f85d  kudu::tablet::DiskRowSet::MajorCompactDeltaStoresWithColumnIds()
          @     0x7f200dc1f504  kudu::tablet::DiskRowSet::MajorCompactDeltaStores()
          @     0x7f200dae1cc3  kudu::tablet::Tablet::CompactWorstDeltas()
          @     0x7f200db74cd7  kudu::tablet::MajorDeltaCompactionOp::Perform()
          @     0x7f2007498827  kudu::MaintenanceManager::LaunchOp()
          @     0x7f200749c773  kudu::MaintenanceManager::RunSchedulerThread()::$_3::operator()()
      

      TSAN warning on use-after-free:

      WARNING: ThreadSanitizer: heap-use-after-free (pid=3392364)
        Read of size 8 at 0x7b4400100060 by thread T23 (mutexes: write M236855935862339888, write M206456689917755456):
          #0 std::__1::vector<kudu::ColumnSchema, std::__1::allocator<kudu::ColumnSchema> >::size() const thirdparty/installed/tsan/include/c++/v1/vector:658:46 (kudu+0x4ee25b)
          #1 kudu::Schema::num_columns() const src/kudu/common/schema.h:584:5 (kudu+0x4eef50)
          #2 kudu::tablet::DeltaPreparer<kudu::tablet::DeltaFilePreparerTraits<(kudu::tablet::DeltaType)0> >::Start(unsigned long, int) src/kudu/tablet/delta_store.cc:204:46 (libtablet.so+0x578488)
          #3 kudu::tablet::DeltaFileIterator<(kudu::tablet::DeltaType)0>::PrepareBatch(unsigned long, int) src/kudu/tablet/deltafile.cc:608:13 (libtablet.so+0x53e8ae)
          #4 kudu::tablet::DeltaIteratorMerger::PrepareBatch(unsigned long, int) src/kudu/tablet/delta_iterator_merger.cc:66:5 (libtablet.so+0x567ce0)
          #5 kudu::tablet::MajorDeltaCompaction::FlushRowSetAndDeltas(kudu::fs::IOContext const*) src/kudu/tablet/delta_compaction.cc:155:5 (libtablet.so+0x561210)
          #6 kudu::tablet::MajorDeltaCompaction::Compact(kudu::fs::IOContext const*) src/kudu/tablet/delta_compaction.cc:340:3 (libtablet.so+0x562f00)
          #7 kudu::tablet::DiskRowSet::MajorCompactDeltaStoresWithColumnIds(std::__1::vector<kudu::ColumnId, std::__1::allocator<kudu::ColumnId> > const&, kudu::fs::IOContext const*, kudu::tablet::HistoryGcOpts) src/kudu/tablet/diskrowset.cc:588:3 (libtablet.so+0x47f7bc)
          #8 kudu::tablet::DiskRowSet::MajorCompactDeltaStores(kudu::fs::IOContext const*, kudu::tablet::HistoryGcOpts) src/kudu/tablet/diskrowset.cc:572:10 (libtablet.so+0x47f463)
          #9 kudu::tablet::Tablet::CompactWorstDeltas(kudu::tablet::RowSet::DeltaCompactionType) src/kudu/tablet/tablet.cc:2881:5 (libtablet.so+0x341c92)
          #10 kudu::tablet::MajorDeltaCompactionOp::Perform() src/kudu/tablet/tablet_mm_ops.cc:364:3 (libtablet.so+0x3d4ca6)
          #11 kudu::MaintenanceManager::LaunchOp(kudu::MaintenanceOp*) src/kudu/util/maintenance_manager.cc:640:9 (libkudu_util.so+0x38c826)
          #12 kudu::MaintenanceManager::RunSchedulerThread()::$_3::operator()() const src/kudu/util/maintenance_manager.cc:422:5 (libkudu_util.so+0x390772)
      ...
        Previous write of size 8 at 0x7b4400100060 by thread T158:
          #0 operator delete(void*) thirdparty/src/llvm-11.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_new_delete.cpp:126 (kudu+0x4dd4e9)
          #1 std::__1::_DeallocateCaller::__do_call(void*) thirdparty/installed/tsan/include/c++/v1/new:334:12 (kudu+0x4e9389)
          #2 std::__1::_DeallocateCaller::__do_deallocate_handle_size(void*, unsigned long) thirdparty/installed/tsan/include/c++/v1/new:292:12 (kudu+0x4e9329)
          #3 std::__1::_DeallocateCaller::__do_deallocate_handle_size_align(void*, unsigned long, unsigned long) thirdparty/installed/tsan/include/c++/v1/new:268:14 (libtablet.so+0x35fe42)
          #4 std::__1::__libcpp_deallocate(void*, unsigned long, unsigned long) thirdparty/installed/tsan/include/c++/v1/new:340:3 (libtablet.so+0x35fdd9)
          #5 std::__1::allocator<std::__1::__shared_ptr_emplace<kudu::Schema, std::__1::allocator<kudu::Schema> > >::deallocate(std::__1::__shared_ptr_emplace<kudu::Schema, std::__1::allocator<kudu::Schema> >*, unsigned long) thirdparty/installed/tsan/include/c++/v1/memory:1798:10 (libtablet.so+0x39e68d)
          #6 std::__1::__shared_ptr_emplace<kudu::Schema, std::__1::allocator<kudu::Schema> >::__on_zero_shared_weak() thirdparty/installed/tsan/include/c++/v1/memory:3515:9 (libtablet.so+0x39e210)
          #7 std::__1::__shared_weak_count::__release_weak() thirdparty/src/llvm-11.0.0.src/projects/libcxx/src/memory.cpp (libc++.so.1+0xc1ec6)
          #8 std::__1::__shared_weak_count::__release_shared() thirdparty/installed/tsan/include/c++/v1/memory:3384:9 (kudu+0x4ee635)
          #9 std::__1::shared_ptr<kudu::Schema>::~shared_ptr() thirdparty/installed/tsan/include/c++/v1/memory:4098:19 (kudu+0x549d68)
          #10 kudu::tablet::TabletMetadata::SetSchema(std::__1::shared_ptr<kudu::Schema> const&, unsigned int) src/kudu/tablet/tablet_metadata.cc:957:1 (libtablet.so+0x4eccb2)
          #11 kudu::tablet::Tablet::AlterSchema(kudu::tablet::AlterSchemaOpState*) src/kudu/tablet/tablet.cc:1727:14 (libtablet.so+0x33b66a)
          #12 kudu::tablet::AlterSchemaOp::Apply(kudu::consensus::CommitMsg**) src/kudu/tablet/ops/alter_schema_op.cc:127:3 (libtablet.so+0x415858)
      ...
      SUMMARY: ThreadSanitizer: heap-use-after-free thirdparty/installed/tsan/include/c++/v1/vector:658:46 in std::__1::vector<kudu::ColumnSchema, std::__1::allocator<kudu::ColumnSchema> >::size() const
      

      TSAN warning on a data race:

      WARNING: ThreadSanitizer: data race (pid=205486)
        Write of size 8 at 0x7b440011ad20 by thread T131:
          #0 std::__1::__vector_base<kudu::ColumnSchema, std::__1::allocator<kudu::ColumnSchema> >::__destruct_at_end(kudu::ColumnSchema*) thirdparty/installed/tsan/include/c++/v1/vector:429:12 (kudu+0x4ee4c0)
          #1 std::__1::__vector_base<kudu::ColumnSchema, std::__1::allocator<kudu::ColumnSchema> >::clear() thirdparty/installed/tsan/include/c++/v1/vector:371:29 (kudu+0x4ee3d4)
          #2 std::__1::__vector_base<kudu::ColumnSchema, std::__1::allocator<kudu::ColumnSchema> >::~__vector_base() thirdparty/installed/tsan/include/c++/v1/vector:465:9 (kudu+0x4ee16b)
          #3 std::__1::vector<kudu::ColumnSchema, std::__1::allocator<kudu::ColumnSchema> >::~vector() thirdparty/installed/tsan/include/c++/v1/vector:557:5 (kudu+0x4eac41)
          #4 kudu::Schema::~Schema() src/kudu/common/schema.h:491:7 (kudu+0x4e5da0)
          #5 std::__1::__shared_ptr_emplace<kudu::Schema, std::__1::allocator<kudu::Schema> >::__on_zero_shared() thirdparty/installed/tsan/include/c++/v1/memory:3503:23 (libtablet.so+0x39e1d5)
          #6 std::__1::__shared_count::__release_shared() thirdparty/installed/tsan/include/c++/v1/memory:3341:9 (kudu+0x4ee685)
          #7 std::__1::__shared_weak_count::__release_shared() thirdparty/installed/tsan/include/c++/v1/memory:3383:27 (kudu+0x4ee629)
          #8 std::__1::shared_ptr<kudu::Schema>::~shared_ptr() thirdparty/installed/tsan/include/c++/v1/memory:4098:19 (kudu+0x549d68)
          #9 kudu::tablet::TabletMetadata::SetSchema(std::__1::shared_ptr<kudu::Schema> const&, unsigned int) src/kudu/tablet/tablet_metadata.cc:957:1 (libtablet.so+0x4ecd52)
          #10 kudu::tablet::Tablet::AlterSchema(kudu::tablet::AlterSchemaOpState*) src/kudu/tablet/tablet.cc:1727:14 (libtablet.so+0x33b69a)
          #11 kudu::tablet::AlterSchemaOp::Apply(kudu::consensus::CommitMsg**) src/kudu/tablet/ops/alter_schema_op.cc:127:3 (libtablet.so+0x415888)
      ...
        Previous read of size 8 at 0x7b440011ad20 by thread T19 (mutexes: write M153539342756080912, write M120888296998031872):
          #0 std::__1::vector<kudu::ColumnSchema, std::__1::allocator<kudu::ColumnSchema> >::size() const thirdparty/installed/tsan/include/c++/v1/vector:658:46 (kudu+0x4ee25b)
          #1 kudu::Schema::num_columns() const src/kudu/common/schema.h:584:5 (kudu+0x4eef50)
          #2 kudu::tablet::DeltaPreparer<kudu::tablet::DeltaFilePreparerTraits<(kudu::tablet::DeltaType)0> >::Start(unsigned long, int) src/kudu/tablet/delta_store.cc:204:46 (libtablet.so+0x578528)
          #3 kudu::tablet::DeltaFileIterator<(kudu::tablet::DeltaType)0>::PrepareBatch(unsigned long, int) src/kudu/tablet/deltafile.cc:608:13 (libtablet.so+0x53e94e)    #4 kudu::tablet::DeltaIteratorMerger::PrepareBatch(unsigned long, int) src/kudu/tablet/delta_iterator_merger.cc:66:5 (libtablet.so+0x567d80)
          #5 kudu::tablet::MajorDeltaCompaction::FlushRowSetAndDeltas(kudu::fs::IOContext const*) src/kudu/tablet/delta_compaction.cc:155:5 (libtablet.so+0x5612b0)
          #6 kudu::tablet::MajorDeltaCompaction::Compact(kudu::fs::IOContext const*) src/kudu/tablet/delta_compaction.cc:340:3 (libtablet.so+0x562fa0)
          #7 kudu::tablet::DiskRowSet::MajorCompactDeltaStoresWithColumnIds(std::__1::vector<kudu::ColumnId, std::__1::allocator<kudu::ColumnId> > const&, kudu::fs::IOContext const*, kudu::tablet::HistoryGcOpts) src/kudu/tablet/diskrowset.cc:588:3 (libtablet.so+0x47f85c)
          #8 kudu::tablet::DiskRowSet::MajorCompactDeltaStores(kudu::fs::IOContext const*, kudu::tablet::HistoryGcOpts) src/kudu/tablet/diskrowset.cc:572:10 (libtablet.so+0x47f503)
          #9 kudu::tablet::Tablet::CompactWorstDeltas(kudu::tablet::RowSet::DeltaCompactionType) src/kudu/tablet/tablet.cc:2881:5 (libtablet.so+0x341cc2)
          #10 kudu::tablet::MajorDeltaCompactionOp::Perform() src/kudu/tablet/tablet_mm_ops.cc:364:3 (libtablet.so+0x3d4cd6)
          #11 kudu::MaintenanceManager::LaunchOp(kudu::MaintenanceOp*) src/kudu/util/maintenance_manager.cc:640:9 (libkudu_util.so+0x38c826)
          #12 kudu::MaintenanceManager::RunSchedulerThread()::$_3::operator()() const src/kudu/util/maintenance_manager.cc:422:5 (libkudu_util.so+0x390772)
      

      Attachments

        Activity

          Commit 3912a97cd8998ef04c4e6f9c38bd365c582e8171 in kudu's branch refs/heads/master from Alexey Serbin
          [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=3912a97cd ]

          KUDU-3570 fix use-after-free in MajorDeltaCompactionOp

          This patch addresses heap-use-after-free and data race issues reported
          in KUDU-3570. With this and one prior patch, neither TSAN nor ASAN
          reports any warnings when running alter_table-randomized-test, at least
          that's the stats collected from more than 100 iterations.

          Change-Id: I491c6d98bed8780bcfb62f152db471d7a260d305
          Reviewed-on: http://gerrit.cloudera.org:8080/21362
          Tested-by: Alexey Serbin <alexey@apache.org>
          Reviewed-by: Abhishek Chennaka <achennaka@cloudera.com>

          jira-bot ASF subversion and git services added a comment - Commit 3912a97cd8998ef04c4e6f9c38bd365c582e8171 in kudu's branch refs/heads/master from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=3912a97cd ] KUDU-3570 fix use-after-free in MajorDeltaCompactionOp This patch addresses heap-use-after-free and data race issues reported in KUDU-3570 . With this and one prior patch, neither TSAN nor ASAN reports any warnings when running alter_table-randomized-test, at least that's the stats collected from more than 100 iterations. Change-Id: I491c6d98bed8780bcfb62f152db471d7a260d305 Reviewed-on: http://gerrit.cloudera.org:8080/21362 Tested-by: Alexey Serbin <alexey@apache.org> Reviewed-by: Abhishek Chennaka <achennaka@cloudera.com>

          Commit 75fc23a988fb582e5d8ffcdd0289bdd6b637a1e6 in kudu's branch refs/heads/branch-1.17.x from Alexey Serbin
          [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=75fc23a98 ]

          KUDU-3570 fix use-after-free in MajorDeltaCompactionOp

          This patch addresses heap-use-after-free and data race issues reported
          in KUDU-3570. With this and one prior patch, neither TSAN nor ASAN
          reports any warnings when running alter_table-randomized-test, at least
          that's the stats collected from more than 100 iterations.

          Change-Id: I491c6d98bed8780bcfb62f152db471d7a260d305
          Reviewed-on: http://gerrit.cloudera.org:8080/21362
          Tested-by: Alexey Serbin <alexey@apache.org>
          Reviewed-by: Abhishek Chennaka <achennaka@cloudera.com>
          (cherry picked from commit 3912a97cd8998ef04c4e6f9c38bd365c582e8171)
          Reviewed-on: http://gerrit.cloudera.org:8080/21365
          Reviewed-by: Wang Xixu <1450306854@qq.com>

          jira-bot ASF subversion and git services added a comment - Commit 75fc23a988fb582e5d8ffcdd0289bdd6b637a1e6 in kudu's branch refs/heads/branch-1.17.x from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=75fc23a98 ] KUDU-3570 fix use-after-free in MajorDeltaCompactionOp This patch addresses heap-use-after-free and data race issues reported in KUDU-3570 . With this and one prior patch, neither TSAN nor ASAN reports any warnings when running alter_table-randomized-test, at least that's the stats collected from more than 100 iterations. Change-Id: I491c6d98bed8780bcfb62f152db471d7a260d305 Reviewed-on: http://gerrit.cloudera.org:8080/21362 Tested-by: Alexey Serbin <alexey@apache.org> Reviewed-by: Abhishek Chennaka <achennaka@cloudera.com> (cherry picked from commit 3912a97cd8998ef04c4e6f9c38bd365c582e8171) Reviewed-on: http://gerrit.cloudera.org:8080/21365 Reviewed-by: Wang Xixu <1450306854@qq.com>

          People

            Unassigned Unassigned
            aserbin Alexey Serbin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: