Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-453

TSAN: Manual DMS flush and Maintenance Manager DMS size check can race in DeltaTracker

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • M4
    • None
    • tablet
    • None

    Description

      I am occasionally getting a TSAN complaint when running a new test that does manual delta flushes.

      The race is triggered by this code in DeltaTracker::Flush():

      // Swap the DeltaMemStore to use the new schema
      old_dms = dms_;
      dms_.reset(new DeltaMemStore(old_dms->id() + 1, schema_, opid_anchor_registry_,
                                   parent_tracker_));
      

      Racing with DeltaTracker::DeltaMemStoreSize() from another thread:

      size_t DeltaTracker::DeltaMemStoreSize() const {
        return dms_->memory_footprint();
      }
      

      Seems we need to better protect access to dms_, either with a lock or by adding a ref to its count while accessing it maybe. TSAN message:

      WARNING: ThreadSanitizer: data race (pid=2354)
        Write of size 8 at 0x7d540005f0e0 by main thread (mutexes: write M16555):
          #0 void std::swap<kudu::tablet::DeltaMemStore*>(kudu::tablet::DeltaMemStore*&, kudu::tablet::DeltaMemStore*&) /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/move.h:83 (libtablet.so+0x0000002af2ba)
          #1 std::tr1::__shared_ptr<kudu::tablet::DeltaMemStore, (__gnu_cxx::_Lock_policy)2>::swap(std::tr1::__shared_ptr<kudu::tablet::DeltaMemStore, (__gnu_cxx::_Lock_policy)2>&) /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/tr1/shared_ptr.h:551 (libtablet.so+0x0000002af260)
          #2 void std::tr1::__shared_ptr<kudu::tablet::DeltaMemStore, (__gnu_cxx::_Lock_policy)2>::reset<kudu::tablet::DeltaMemStore>(kudu::tablet::DeltaMemStore*) /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/tr1/shared_ptr.h:505 (libtablet.so+0x0000002ab865)
          #3 kudu::tablet::DeltaTracker::Flush(kudu::tablet::DeltaTracker::MetadataFlushType) /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/delta_tracker.cc:412 (libtablet.so+0x0000002aab70)
          #4 kudu::tablet::DiskRowSet::FlushDeltas() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/diskrowset.cc:409 (libtablet.so+0x000000238bcb)
          #5 kudu::tablet::Tablet::FlushBiggestDMS() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/tablet.cc:1381 (libtablet.so+0x0000001a1b02)
          #6 kudu::RemoteBootstrapTest_TestRemoteBootstrap_Test::TestBody() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/integration-tests/remote_bootstrap-test.cc:226 (remote_bootstrap-test+0x0000000b001b)
          #7 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null>:0 (libgtest.so+0x00000005f8a5)
          #8 __libc_start_main <null>:0 (libc.so.6+0x00000001ecdc)
      
        Previous read of size 8 at 0x7d540005f0e0 by thread T49 (mutexes: write M1271):
          #0 std::tr1::__shared_ptr<kudu::tablet::DeltaMemStore, (__gnu_cxx::_Lock_policy)2>::operator->() const /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/tr1/shared_ptr.h:525 (libtablet.so+0x0000002ac279)
          #1 kudu::tablet::DeltaTracker::DeltaMemStoreSize() const /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/delta_tracker.cc:470 (libtablet.so+0x0000002ab050)
          #2 kudu::tablet::DiskRowSet::DeltaMemStoreSize() const /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/diskrowset.cc:562 (libtablet.so+0x000000239ca7)
          #3 kudu::tablet::Tablet::DeltaMemStoresSize() const /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/tablet.cc:1362 (libtablet.so+0x0000001a18df)
          #4 kudu::tablet::FlushDeltaMemStoresOp::UpdateStats(kudu::MaintenanceOpStats*) /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/tablet.cc:847 (libtablet.so+0x0000001c40ea)
          #5 kudu::MaintenanceManager::FindBestOp() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/maintenance_manager.cc:239 (libtablet.so+0x000000243a11)
          #6 kudu::MaintenanceManager::RunSchedulerThread() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/maintenance_manager.cc:187 (libtablet.so+0x000000242fc4)
          #7 boost::_mfi::mf0<void, kudu::MaintenanceManager>::operator()(kudu::MaintenanceManager*) const /usr/include/boost/bind/mem_fn_template.hpp:49 (libtablet.so+0x00000024794d)
          #8 void boost::_bi::list1<boost::_bi::value<kudu::MaintenanceManager*> >::operator()<boost::_mfi::mf0<void, kudu::MaintenanceManager>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, kudu::MaintenanceManager>&, boost::_bi::list0&, int) /usr/include/boost/bind/bind.hpp:246 (libtablet.so+0x0000002478ba)
          #9 boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::MaintenanceManager>, boost::_bi::list1<boost::_bi::value<kudu::MaintenanceManager*> > >::operator()() /usr/include/boost/bind/bind_template.hpp:20 (libtablet.so+0x000000247863)
          #10 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::MaintenanceManager>, boost::_bi::list1<boost::_bi::value<kudu::MaintenanceManager*> > >, void>::invoke(boost::detail::function::function_buffer&) /usr/include/boost/function/function_template.hpp:153 (libtablet.so+0x000000247669)
          #11 boost::function0<void>::operator()() const /usr/include/boost/function/function_template.hpp:1012 (libtablet.so+0x0000001fb051)
          #12 kudu::Thread::SuperviseThread(void*) /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/util/thread.cc:435 (libkudu_util.so+0x000000138a0b)
      
        Location is heap block of size 520 at 0x7d540005f000 allocated by main thread:
          #0 operator new(unsigned long) /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/thirdparty/llvm-3.4.2.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:560 (remote_bootstrap-test+0x00000004590a)
          #1 kudu::tablet::DiskRowSet::Open() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/diskrowset.cc:398 (libtablet.so+0x000000238910)
          #2 kudu::tablet::DiskRowSet::Open(std::tr1::shared_ptr<kudu::metadata::RowSetMetadata> const&, kudu::log::OpIdAnchorRegistry*, std::tr1::shared_ptr<kudu::tablet::DiskRowSet>*, std::tr1::shared_ptr<kudu::MemTracker> const&) /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/diskrowset.cc:376 (libtablet.so+0x00000023879a)
          #3 kudu::tablet::Tablet::DoCompactionOrFlush(kudu::Schema const&, kudu::tablet::RowSetsInCompaction const&, long) /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/tablet.cc:1058 (libtablet.so+0x00000019cdd4)
          #4 kudu::tablet::Tablet::FlushInternal(kudu::tablet::RowSetsInCompaction const&, std::tr1::shared_ptr<kudu::tablet::MemRowSet> const&, kudu::Schema const&) /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/tablet.cc:616 (libtablet.so+0x00000019c603)
          #5 kudu::tablet::Tablet::FlushUnlocked() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/tablet.cc:557 (libtablet.so+0x00000019c042)
          #6 kudu::tablet::Tablet::Flush() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/tablet.cc:538 (libtablet.so+0x00000019bf44)
          #7 kudu::RemoteBootstrapTest_TestRemoteBootstrap_Test::TestBody() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/integration-tests/remote_bootstrap-test.cc:225 (remote_bootstrap-test+0x0000000aff5f)
          #8 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null>:0 (libgtest.so+0x00000005f8a5)
          #9 __libc_start_main <null>:0 (libc.so.6+0x00000001ecdc)
      
        Mutex M16555 created at:
          #0 pthread_mutex_init /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/thirdparty/llvm-3.4.2.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:925 (remote_bootstrap-test+0x000000049a8c)
          #1 boost::mutex::mutex() /usr/include/boost/thread/pthread/mutex.hpp:37 (libtserver.so+0x000000098e1b)
          #2 kudu::tablet::DeltaTracker::DeltaTracker(std::tr1::shared_ptr<kudu::metadata::RowSetMetadata> const&, kudu::Schema const&, unsigned int, kudu::log::OpIdAnchorRegistry*, kudu::MemTracker*) /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/delta_tracker.cc:41 (libtablet.so+0x0000002a763f)
          #3 kudu::tablet::DiskRowSet::Open() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/diskrowset.cc:398 (libtablet.so+0x000000238967)
          #4 kudu::tablet::DiskRowSet::Open(std::tr1::shared_ptr<kudu::metadata::RowSetMetadata> const&, kudu::log::OpIdAnchorRegistry*, std::tr1::shared_ptr<kudu::tablet::DiskRowSet>*, std::tr1::shared_ptr<kudu::MemTracker> const&) /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/diskrowset.cc:376 (libtablet.so+0x00000023879a)
          #5 kudu::tablet::Tablet::DoCompactionOrFlush(kudu::Schema const&, kudu::tablet::RowSetsInCompaction const&, long) /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/tablet.cc:1058 (libtablet.so+0x00000019cdd4)
          #6 kudu::tablet::Tablet::FlushInternal(kudu::tablet::RowSetsInCompaction const&, std::tr1::shared_ptr<kudu::tablet::MemRowSet> const&, kudu::Schema const&) /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/tablet.cc:616 (libtablet.so+0x00000019c603)
          #7 kudu::tablet::Tablet::FlushUnlocked() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/tablet.cc:557 (libtablet.so+0x00000019c042)
          #8 kudu::tablet::Tablet::Flush() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/tablet.cc:538 (libtablet.so+0x00000019bf44)
          #9 kudu::RemoteBootstrapTest_TestRemoteBootstrap_Test::TestBody() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/integration-tests/remote_bootstrap-test.cc:225 (remote_bootstrap-test+0x0000000aff5f)
          #10 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null>:0 (libgtest.so+0x00000005f8a5)
          #11 __libc_start_main <null>:0 (libc.so.6+0x00000001ecdc)
      
        Mutex M1271 created at:
          #0 pthread_mutex_init /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/thirdparty/llvm-3.4.2.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:925 (remote_bootstrap-test+0x000000049a8c)
          #1 boost::mutex::mutex() /usr/include/boost/thread/pthread/mutex.hpp:37 (libtserver.so+0x000000098e1b)
          #2 kudu::MaintenanceManager::MaintenanceManager(kudu::MaintenanceManager::Options const&) /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/maintenance_manager.cc:93 (libtablet.so+0x000000242583)
          #3 kudu::tserver::TabletServer::TabletServer(kudu::tserver::TabletServerOptions const&) /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tserver/tablet_server.cc:41 (libtserver.so+0x0000000a7060)
          #4 kudu::tserver::MiniTabletServer::Start() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tserver/mini_tablet_server.cc:64 (libtserver.so+0x000000093a80)
          #5 kudu::MiniCluster::AddTabletServer() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/integration-tests/mini_cluster.cc:102 (libintegration-tests.so+0x000000020107)
          #6 kudu::MiniCluster::Start() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/integration-tests/mini_cluster.cc:65 (libintegration-tests.so+0x00000001fbba)
          #7 kudu::RemoteBootstrapTest::Start() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/integration-tests/remote_bootstrap-test.cc:100 (remote_bootstrap-test+0x0000000c0187)
          #8 kudu::RemoteBootstrapTest::SetUp() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/integration-tests/remote_bootstrap-test.cc:81 (remote_bootstrap-test+0x0000000b6932)
          #9 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null>:0 (libgtest.so+0x00000005f8a5)
          #10 __libc_start_main <null>:0 (libc.so.6+0x00000001ecdc)
      
        Thread T49 'maintenance_sch' (tid=2734, running) created by main thread at:
          #0 pthread_create /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/thirdparty/llvm-3.4.2.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:877 (remote_bootstrap-test+0x0000000493df)
          #1 kudu::Thread::StartThread(std::string const&, std::string const&, boost::function<void ()()> const&, scoped_refptr<kudu::Thread>*) /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/util/thread.cc:365 (libkudu_util.so+0x0000001384b6)
          #2 kudu::Status kudu::Thread::Create<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::MaintenanceManager>, boost::_bi::list1<boost::_bi::value<kudu::MaintenanceManager*> > > >(std::string const&, std::string const&, boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::MaintenanceManager>, boost::_bi::list1<boost::_bi::value<kudu::MaintenanceManager*> > > const&, scoped_refptr<kudu::Thread>*) /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/util/thread.h:116 (libtablet.so+0x000000244baa)
          #3 kudu::MaintenanceManager::Init() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tablet/maintenance_manager.cc:106 (libtablet.so+0x000000242c87)
          #4 kudu::tserver::TabletServer::Start() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tserver/tablet_server.cc:95 (libtserver.so+0x0000000a7a5e)
          #5 kudu::tserver::MiniTabletServer::Start() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/tserver/mini_tablet_server.cc:66 (libtserver.so+0x000000093ad2)
          #6 kudu::MiniCluster::AddTabletServer() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/integration-tests/mini_cluster.cc:102 (libintegration-tests.so+0x000000020107)
          #7 kudu::MiniCluster::Start() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/integration-tests/mini_cluster.cc:65 (libintegration-tests.so+0x00000001fbba)
          #8 kudu::RemoteBootstrapTest::Start() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/integration-tests/remote_bootstrap-test.cc:100 (remote_bootstrap-test+0x0000000c0187)
          #9 kudu::RemoteBootstrapTest::SetUp() /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/TSAN/label/kudu-gerrit-slaves/src/kudu/integration-tests/remote_bootstrap-test.cc:81 (remote_bootstrap-test+0x0000000b6932)
          #10 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null>:0 (libgtest.so+0x00000005f8a5)
          #11 __libc_start_main <null>:0 (libc.so.6+0x00000001ecdc)
      
      SUMMARY: ThreadSanitizer: data race /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/move.h:83 void std::swap<kudu::tablet::DeltaMemStore*>(kudu::tablet::DeltaMemStore*&, kudu::tablet::DeltaMemStore*&)
      

      Attachments

        Activity

          People

            tlipcon Todd Lipcon
            mpercy Mike Percy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: