Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4440

dcheck failure: lineage-util.h:91] Check failed: lineage.ended >= lineage.started

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Backend
    • Labels:

      Description

      The stress test hit this on Saturday night / Sunday morning, and a re-run under the same Git hash Sunday later didn't hit it. The Git hash that corresponds to the crash is e76d541b08a394e93097808dbd42a0dd6742898e along cdh5-trunk (since the stress test is on a CDH cluster, Impala comes from CDH). It looks like IMPALA-2646. I expect the change to standard time is the problem.

      #0  0x0000003433832625 in raise () from /lib64/libc.so.6
      #1  0x0000003433833e05 in abort () from /lib64/libc.so.6
      #2  0x00000000027fe814 in google::DumpStackTraceAndExit() ()
      #3  0x00000000027f7c7d in google::LogMessage::Fail() ()
      #4  0x00000000027fa5a6 in google::LogMessage::SendToLog() ()
      #5  0x00000000027f779d in google::LogMessage::Flush() ()
      #6  0x00000000027fb04e in google::LogMessageFatal::~LogMessageFatal() ()
      #7  0x000000000147a3b1 in impala::LineageUtil::TLineageToJSON (lineage=..., out=0x7fa925ad1e10) at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/be/src/util/lineage-util.h:91
      #8  0x0000000001461f2c in impala::ImpalaServer::LogLineageRecord (this=0xe873200, query_exec_state=...) at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/be/src/service/impala-server.cc:383
      #9  0x0000000001463671 in impala::ImpalaServer::LogQueryEvents (this=0xe873200, exec_state=...) at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/be/src/service/impala-server.cc:541
      #10 0x000000000146844e in impala::ImpalaServer::UnregisterQuery (this=0xe873200, query_id=..., check_inflight=true, cause=0x0) at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/be/src/service/impala-server.cc:939
      #11 0x00000000014c47e8 in impala::ImpalaServer::CloseOperation (this=0xe873200, return_val=..., request=...) at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/be/src/service/impala-hs2-server.cc:686
      #12 0x000000000193dec2 in apache::hive::service::cli::thrift::TCLIServiceProcessor::process_CloseOperation (this=0xfcee260, seqid=0, iprot=0x760b0360, oprot=0x760b0a50, callContext=0xe3bcbc0)
          at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/be/generated-sources/gen-cpp/TCLIService.cpp:5235
      #13 0x0000000001937bc8 in apache::hive::service::cli::thrift::TCLIServiceProcessor::dispatchCall (this=0xfcee260, iprot=0x760b0360, oprot=0x760b0a50, fname=..., seqid=0, callContext=0xe3bcbc0)
          at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/be/generated-sources/gen-cpp/TCLIService.cpp:4506
      #14 0x0000000001912aa1 in impala::ImpalaHiveServer2ServiceProcessor::dispatchCall (this=0xfcee260, iprot=0x760b0360, oprot=0x760b0a50, fname=..., seqid=0, callContext=0xe3bcbc0)
          at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/be/generated-sources/gen-cpp/ImpalaHiveServer2Service.cpp:463
      #15 0x000000000115df6c in apache::thrift::TDispatchProcessor::process (this=0xfcee260, in=..., out=..., connectionContext=0xe3bcbc0) at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/toolchain/thrift-0.9.0-p8/include/thrift/TDispatchProcessor.h:121
      #16 0x00000000027aab2b in apache::thrift::server::TThreadPoolServer::Task::run() ()
      #17 0x0000000002793419 in apache::thrift::concurrency::ThreadManager::Worker::run() ()
      #18 0x00000000013212a5 in impala::ThriftThread::RunRunnable (this=0xfcebf80, runnable=..., promise=0x7ffd3bfbad80) at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/be/src/rpc/thrift-thread.cc:64
      #19 0x00000000013229d1 in boost::_mfi::mf2<void, impala::ThriftThread, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long>*>::operator() (this=0xeb842d0, p=0xfcebf80, a1=..., a2=0x7ffd3bfbad80)
          at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:280
      #20 0x0000000001322867 in boost::_bi::list3<boost::_bi::value<impala::ThriftThread*>, boost::_bi::value<boost::shared_ptr<apache::thrift::concurrency::Runnable> >, boost::_bi::value<impala::Promise<unsigned long>*> >::operator()<boost::_mfi::mf2<void, impala::ThriftThread, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long>*>, boost::_bi::list0> (this=0xeb842e0, f=..., a=...) at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/bind/bind.hpp:392
      #21 0x00000000013225b3 in boost::_bi::bind_t<void, boost::_mfi::mf2<void, impala::ThriftThread, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long>*>, boost::_bi::list3<boost::_bi::value<impala::ThriftThread*>, boost::_bi::value<boost::shared_ptr<apache::thrift::concurrency::Runnable> >, boost::_bi::value<impala::Promise<unsigned long>*> > >::operator() (this=0xeb842d0) at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20
      #22 0x00000000013224c6 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf2<void, impala::ThriftThread, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long>*>, boost::_bi::list3<boost::_bi::value<impala::ThriftThread*>, boost::_bi::value<boost::shared_ptr<apache::thrift::concurrency::Runnable> >, boost::_bi::value<impala::Promise<unsigned long>*> > >, void>::invoke (function_obj_ptr=...)
          at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/function/function_template.hpp:153
      #23 0x000000000132fe32 in boost::function0<void>::operator() (this=0x7fa925ad2c60) at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/function/function_template.hpp:767
      #24 0x00000000015da1b9 in impala::Thread::SuperviseThread (name=..., category=..., functor=..., thread_started=0x7ffd3bfbab70) at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/be/src/util/thread.cc:317
      #25 0x00000000015e1192 in boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> >::operator()<void (*)(const std::basic_string<char>&, const std::basic_string<char>&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> > &, const std::basic_string<char, std::char_traits<char>, std::allocator<char> > &, boost::function<void()>, impala::Promise<long> *), boost::_bi::list0 &, int) (this=0xeaf6fc0, f=@0xeaf6fb8, a=...)
          at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/bind/bind.hpp:457
      #26 0x00000000015e10d5 in boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > >::operator()(void) (this=0xeaf6fb8) at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20
      #27 0x00000000015e1030 in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > > >::run(void) (this=0xeaf6e00) at /usr/src/debug/impala-2.7.0-cdh5.10.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/thread/detail/thread.hpp:116
      #28 0x0000000001a2929a in thread_proxy ()
      #29 0x0000003433c079d1 in start_thread () from /lib64/libpthread.so.0
      #30 0x00000034338e88fd in clone () from /lib64/libc.so.6
      

      The core, debug symbols, logs are on impala-desktop in a bug directory with this Jira's name. I collected the symbols based on the instructions at https://cwiki.apache.org/confluence/display/IMPALA/Debugging+Impala+Core+Dumps+on+Another+System . If something isn't right on impala-desktop, let's figure out what else we need so the instructions linked are correct.

      Also, backend seems like a catchall component. Is there a better component that this should have been filed under? Let me know.

        Activity

        Hide
        dhecht Dan Hecht added a comment -

        It looks to me like both the frontend and backend are broken:

        ColumnLineageGraph::init()
            SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
            try {
              timestamp_ = df.parse(queryCtx.now_string).getTime() / 1000;
        
        ImpalaServer::LogLineageRecord()
          // Set the query end time in TLineageGraph
          time_t utc_end_time;
          query_exec_state.end_time().ToUnixTimeInUTC(&utc_end_time);
          lineage_graph.__set_ended(utc_end_time);
        

        where 'query_exec_state.end_time()' is a timstampval (no timezone info) in local time. (As an aside, I don't understand the naming of ToUnixTimeInUTC() given that it takes a timestamp, assumes it's in local time and converts to unix time.)

        So, neither of these work across the DST transition since 1am-2am local time are ambiguous times (when no timezone is specified).

        Show
        dhecht Dan Hecht added a comment - It looks to me like both the frontend and backend are broken: ColumnLineageGraph::init() SimpleDateFormat df = new SimpleDateFormat( "yyyy-MM-dd HH:mm:ss" ); try { timestamp_ = df.parse(queryCtx.now_string).getTime() / 1000; ImpalaServer::LogLineageRecord() // Set the query end time in TLineageGraph time_t utc_end_time; query_exec_state.end_time().ToUnixTimeInUTC(&utc_end_time); lineage_graph.__set_ended(utc_end_time); where 'query_exec_state.end_time()' is a timstampval (no timezone info) in local time. (As an aside, I don't understand the naming of ToUnixTimeInUTC() given that it takes a timestamp, assumes it's in local time and converts to unix time.) So, neither of these work across the DST transition since 1am-2am local time are ambiguous times (when no timezone is specified).
        Hide
        mikesbrown Michael Brown added a comment -

        Dan Hecht if you inspected the artifacts on impala-desktop, please report whether I can run the stress test again and blow away logs, cores on the native machine, etc.

        Show
        mikesbrown Michael Brown added a comment - Dan Hecht if you inspected the artifacts on impala-desktop, please report whether I can run the stress test again and blow away logs, cores on the native machine, etc.
        Hide
        dhecht Dan Hecht added a comment -

        Michael Brown, core file appeared to load fine for me and log files are there. Please go ahead (sorry, didn't realize you were waiting for confirmation).

        Show
        dhecht Dan Hecht added a comment - Michael Brown , core file appeared to load fine for me and log files are there. Please go ahead (sorry, didn't realize you were waiting for confirmation).
        Hide
        dhecht Dan Hecht added a comment -

        commit 035b775a6dca0609ffec049f765b311d4eda6ebd
        Author: Dan Hecht <dhecht@cloudera.com>
        Date: Thu Nov 17 10:06:28 2016 -0800

        IMPALA-4440: lineage timestamps can go backwards across daylight savings transitions

        Using TimestampValue (or equivalent string representation) for
        timestamps that require a point in time doesn't work because the same
        time can represent multiple point in times. For example, the timestamp:
        '2016-11-13 01:01 AM' occurred twice last weekend.

        Instead, we should use unix time directly rather than trying to derive
        unix time from a (timezone-less) timestamp.

        Note that there are other questionable uses of TimestampValue for
        internal Impala service stuff, but I want to fix them separately as they
        are not as important and fixing does add some risk.

        While I'm here, remove a template TimestampValue constructor that was
        unused and is confusing.

        We don't have any end-to-end tests that exercise column lineage, so add
        a simple custom cluster test that enables lineage and verifes the start
        and end unix times are within appropriate bounds. The other column
        lineage graph fields are at least tested via planner tests.

        Automated regression testing for the specifc daylight savings issue is
        difficult as we'd have to cross the daylight savings boundary at just
        the right time during query execution in order to reproduce
        reliably. But open to ideas.

        Testing:

        • loop the new test overnight without any failures.
        • exhaustive run.

        Change-Id: I34e435fc3511e65bc62906205cb558f2c116a8a9
        Reviewed-on: http://gerrit.cloudera.org:8080/5129
        Reviewed-by: Dan Hecht <dhecht@cloudera.com>
        Tested-by: Internal Jenkins

        Show
        dhecht Dan Hecht added a comment - commit 035b775a6dca0609ffec049f765b311d4eda6ebd Author: Dan Hecht <dhecht@cloudera.com> Date: Thu Nov 17 10:06:28 2016 -0800 IMPALA-4440 : lineage timestamps can go backwards across daylight savings transitions Using TimestampValue (or equivalent string representation) for timestamps that require a point in time doesn't work because the same time can represent multiple point in times. For example, the timestamp: '2016-11-13 01:01 AM' occurred twice last weekend. Instead, we should use unix time directly rather than trying to derive unix time from a (timezone-less) timestamp. Note that there are other questionable uses of TimestampValue for internal Impala service stuff, but I want to fix them separately as they are not as important and fixing does add some risk. While I'm here, remove a template TimestampValue constructor that was unused and is confusing. We don't have any end-to-end tests that exercise column lineage, so add a simple custom cluster test that enables lineage and verifes the start and end unix times are within appropriate bounds. The other column lineage graph fields are at least tested via planner tests. Automated regression testing for the specifc daylight savings issue is difficult as we'd have to cross the daylight savings boundary at just the right time during query execution in order to reproduce reliably. But open to ideas. Testing: loop the new test overnight without any failures. exhaustive run. Change-Id: I34e435fc3511e65bc62906205cb558f2c116a8a9 Reviewed-on: http://gerrit.cloudera.org:8080/5129 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins

          People

          • Assignee:
            dhecht Dan Hecht
            Reporter:
            mikesbrown Michael Brown
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development