Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4363

SELECTing invalid timestamp value from Parquet file crashes impalad

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.3.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Backend
    • Labels:

      Description

      The attached file is a small Parquet file created with Hive on CDH5.5.5. It contains a single row with a BIGINT and a Timestamp; the timestamp is malformed.

      Attach this file as an Impala table, then do a "SELECT * from t;" crashes Impala.
      The crash happens in an ASCII formatter; note that "INSERT INTO t2 SELECT * FROM t;" works as expected, does not crash.

      Running with gdb attached yields the following stack trace:

      0x00007f745673ec37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
      56	../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
      (gdb) bt
      #0  0x00007f745673ec37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
      #1  0x00007f7456742028 in __GI_abort () at abort.c:89
      #2  0x00007f7457266cbd in __gnu_cxx::__verbose_terminate_handler () at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/vterminate.cc:95
      #3  0x00007f7457264d46 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_terminate.cc:47
      #4  0x00007f7457264d91 in std::terminate () at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_terminate.cc:57
      #5  0x00007f7457264fa8 in __cxxabiv1::__cxa_throw (obj=0xa13b880, 
          tinfo=0x3d926e0 <typeinfo for boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::gregorian::bad_year> >>, 
          dest=0x11b95fc <boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::gregorian::bad_year> >::~clone_impl()>)
          at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_throw.cc:87
      #6  0x00000000011b6da4 in boost::throw_exception<boost::gregorian::bad_year> (e=...) at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/throw_exception.hpp:69
      #7  0x00000000011b3b80 in boost::CV::simple_exception_policy<unsigned short, (unsigned short)1400, (unsigned short)10000, boost::gregorian::bad_year>::on_error ()
          at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/date_time/constrained_value.hpp:110
      #8  0x00000000011aff88 in boost::CV::constrained_value<boost::CV::simple_exception_policy<unsigned short, (unsigned short)1400, (unsigned short)10000, boost::gregorian::bad_year> >::assign (
          this=0x7f73ce6875d0, value=1112) at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/date_time/constrained_value.hpp:69
      #9  0x00000000011ae1c6 in boost::CV::constrained_value<boost::CV::simple_exception_policy<unsigned short, (unsigned short)1400, (unsigned short)10000, boost::gregorian::bad_year> >::constrained_value
          (this=0x7f73ce6875d0, value=1112) at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/date_time/constrained_value.hpp:48
      #10 0x00000000011ad638 in boost::gregorian::greg_year::greg_year (this=0x7f73ce6875d0, year=1112) at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/date_time/gregorian/greg_year.hpp:41
      #11 0x00000000011b0b12 in boost::date_time::gregorian_calendar_base<boost::date_time::year_month_day_base<boost::gregorian::greg_year, boost::gregorian::greg_month, boost::gregorian::greg_day>, unsigned int>::from_day_number (dayNumber=2127216) at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/date_time/gregorian_calendar.ipp:122
      #12 0x0000000001360dbb in boost::date_time::date<boost::gregorian::date, boost::gregorian::gregorian_calendar, boost::gregorian::date_duration>::year_month_day (this=0x7f73ce687680)
          at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/date_time/date.hpp:99
      #13 0x000000000136124e in boost::date_time::date_formatter<boost::gregorian::date, boost::date_time::iso_extended_format<char>, char>::date_to_string (d=...)
          at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/date_time/date_formatting.hpp:125
      #14 0x0000000001360ebd in boost::gregorian::to_iso_extended_string_type<char> (d=...) at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/date_time/gregorian/formatters.hpp:79
      #15 0x0000000001360d9c in boost::gregorian::to_iso_extended_string (d=...) at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/date_time/gregorian/formatters.hpp:85
      #16 0x0000000001360933 in impala::TimestampValue::DebugString (this=0xa13b7a0) at /home/laszlog/Impala/be/src/runtime/timestamp-value.cc:129
      #17 0x0000000001360784 in impala::operator<< (os=..., timestamp_value=...) at /home/laszlog/Impala/be/src/runtime/timestamp-value.cc:102
      #18 0x000000000140189c in impala::RawValue::PrintValue (value=0xa13b7a0, type=..., scale=-1, stream=0x7f73ce687a40) at /home/laszlog/Impala/be/src/runtime/raw-value.cc:338
      #19 0x000000000150cc61 in impala::AsciiQueryResultSet::AddOneRow (this=0xbbd7180, col_values=..., scales=...) at /home/laszlog/Impala/be/src/service/query-result-set.cc:172
      #20 0x00000000018228c3 in impala::PlanRootSink::Send (this=0xb14c260, state=0x66a1600, batch=0xb6ebba0) at /home/laszlog/Impala/be/src/exec/plan-root-sink.cc:114
      #21 0x00000000019c93f9 in impala::PlanFragmentExecutor::ExecInternal (this=0xb709088) at /home/laszlog/Impala/be/src/runtime/plan-fragment-executor.cc:357
      #22 0x00000000019c8e33 in impala::PlanFragmentExecutor::Exec (this=0xb709088) at /home/laszlog/Impala/be/src/runtime/plan-fragment-executor.cc:327
      #23 0x00000000015267ba in impala::FragmentMgr::FragmentExecState::Exec (this=0xb708d00) at /home/laszlog/Impala/be/src/service/fragment-exec-state.cc:59
      #24 0x000000000151df08 in impala::FragmentMgr::FragmentThread (this=0xa715ac0, fragment_instance_id=...) at /home/laszlog/Impala/be/src/service/fragment-mgr.cc:86
      #25 0x0000000001521c8a in boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>::operator() (this=0xa3011a0, p=0xa715ac0, a1=...)
          at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
      #26 0x0000000001521a47 in boost::_bi::list2<boost::_bi::value<impala::FragmentMgr*>, boost::_bi::value<impala::TUniqueId> >::operator()<boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list0> (this=0xa3011b0, f=..., a=...) at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/bind/bind.hpp:313
      #27 0x0000000001521371 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list2<boost::_bi::value<impala::FragmentMgr*>, boost::_bi::value<impala::TUniqueId> > >::operator() (this=0xa3011a0) at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20
      #28 0x0000000001520d04 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list2<boost::_bi::value<impala::FragmentMgr*>, boost::_bi::value<impala::TUniqueId> > >, void>::invoke (function_obj_ptr=...)
          at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/function/function_template.hpp:153
      #29 0x0000000001323a96 in boost::function0<void>::operator() (this=0x7f73ce688d60) at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/function/function_template.hpp:767
      #30 0x00000000015dd2b9 in impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*) (name=..., category=..., functor=..., 
          thread_started=0x7f73cd67ea00) at /home/laszlog/Impala/be/src/util/thread.cc:317
      #31 0x00000000015e4292 in boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> >::operator()<void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list0&, int) (this=0xa30a9c0, 
          f=@0xa30a9b8: 0x15dcff4 <impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*)>, a=...)
          at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/bind/bind.hpp:457
      #32 0x00000000015e41d5 in boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > >::operator()() (this=0xa30a9b8)
          at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20
      #33 0x00000000015e4130 in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > > >::run() (this=0xa30a800)
          at /home/laszlog/Impala/toolchain/boost-1.57.0/include/boost/thread/detail/thread.hpp:116
      #34 0x0000000001a2ecea in thread_proxy ()
      #35 0x00007f7456ad5184 in start_thread (arg=0x7f73ce689700) at pthread_create.c:312
      #36 0x00007f745680237d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      

      Looks like the boost::exception is allowed to escape from RawValue::PrintValue(...).

      1. 000000_0
        0.5 kB
        Laszlo Gaal

        Issue Links

          Activity

          Hide
          tarasbob Taras Bobrovytsky added a comment -
          commit 858f5c219710f1b72b25e509643f0cf9e1113dee
          Author: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
          Date:   Fri Nov 4 17:12:04 2016 -0700
          
              IMPALA-4363: Add Parquet timestamp validation
          
              Before this patch, we would simply read the INT96 Parquet timestamp
              representation and assume that it's valid. However, not all bit
              permutations represent a valid timestamp. One of the boost functions
              raised an exception (that we didn't catch) when passed an invalid
              boost date object, which resulted in a crash. This patch fixes
              problem by validating that the date falls into 1400..9999 year
              range as we are scanning Parquet.
          
              Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac
              Reviewed-on: http://gerrit.cloudera.org:8080/5343
              Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
              Tested-by: Internal Jenkins
          
          Show
          tarasbob Taras Bobrovytsky added a comment - commit 858f5c219710f1b72b25e509643f0cf9e1113dee Author: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Date: Fri Nov 4 17:12:04 2016 -0700 IMPALA-4363: Add Parquet timestamp validation Before this patch, we would simply read the INT96 Parquet timestamp representation and assume that it's valid. However, not all bit permutations represent a valid timestamp. One of the boost functions raised an exception (that we didn't catch ) when passed an invalid boost date object, which resulted in a crash. This patch fixes problem by validating that the date falls into 1400..9999 year range as we are scanning Parquet. Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac Reviewed-on: http: //gerrit.cloudera.org:8080/5343 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins

            People

            • Assignee:
              tarasbob Taras Bobrovytsky
              Reporter:
              laszlog Laszlo Gaal
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development