Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3254

Crash in Kudu C++ client when working with stale scan tokens containing tablet location info

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.13.0, 1.14.0
    • 1.15.0
    • client
    • None

    Description

      With KUDU-1802 implemented, the meta-cache in Kudu C++ client might crash if using a scan token with information on tablet location in scenarios like below:

      1. Scan tokens were generated for table with multiple ranges (e.g., with two ranges: [-100, 0), [0, 100)).
      2. First range was dropped (e.g., range [-100, 0) is dropped).
      3. A client was fed a set of tokens generated at step 1 to read from the table (now with one stale token corresponding to the dropped range).
      4. The same client instance was used to write into the table.
      5. The same client instance fed the original set of tokens once more to read from the table again.

      The client would crash at step 5 of the sequence above.

      The stack trace on crash might look like this (captured on macOS):

            * frame #0: 0x00007fff7035833a libsystem_kernel.dylib`__pthread_kill + 10
              frame #1: 0x00007fff70414e60 libsystem_pthread.dylib`pthread_kill + 430
              frame #2: 0x00007fff702df808 libsystem_c.dylib`abort + 120
              frame #3: 0x000000010ca1a259 libglog.0.dylib`google::logging_fail() at logging.cc:1474:3
              frame #4: 0x000000010ca19121 libglog.0.dylib`google::LogMessage::SendToLog() [inlined] google::LogMessage::Fail() at logging.cc:
      1488:3
              frame #5: 0x000000010ca1911b libglog.0.dylib`google::LogMessage::SendToLog() at logging.cc:1442
              frame #6: 0x000000010ca19815 libglog.0.dylib`google::LogMessage::Flush() at logging.cc:1311:5
              frame #7: 0x000000010ca1d76f libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2023:5
              frame #8: 0x000000010ca1a5f9 libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2022:37
              frame #9: 0x0000000103e365e3 libkudu_client.dylib`std::__1::map<std::__
      1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > >::mapped_type& FindOrDie<std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > > >() at map-util.h:109:3
              frame #10: 0x0000000103e34cbb libkudu_client.dylib`kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse() at meta_cache.cc:943:23
              frame #11: 0x0000000103e86166 libkudu_client.dylib`kudu::client::KuduScanToken::Data::PBIntoScanner() at scan_token-internal.cc:192:35
              frame #12: 0x0000000103e88051 libkudu_client.dylib`kudu::client::KuduScanToken::Data::DeserializeIntoScanner() at scan_token-internal.cc:111:10
              frame #13: 0x0000000103d55d3c libkudu_client.dylib`kudu::client::KuduScanToken::DeserializeIntoScanner() at client.cc:1879:10
      

      The issue is fixed in Kudu 1.14 with this changelist.

      Attachments

        Issue Links

          Activity

            People

              aserbin Alexey Serbin
              aserbin Alexey Serbin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: