Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4387

Avro scanner crashes if the file schema has invalid decimal precision or scale

    Details

      Description

      Haven't seen this before; may be a flaky test.

      04:26:14.562 ==================================== ERRORS ====================================
      04:26:14.562  ERROR at teardown of TestScannersFuzzing.test_fuzz_decimal_tbl[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: avro/snap/block] 
      04:26:14.562 [gw2] linux2 -- Python 2.6.6 /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/bin/../infra/python/env/bin/python
      04:26:14.562 conftest.py:265: in cleanup
      04:26:14.562     request.instance.execute_query_expect_success(request.instance.client, "use default")
      04:26:14.562 common/impala_test_suite.py:418: in wrapper
      04:26:14.562     return function(*args, **kwargs)
      04:26:14.562 common/impala_test_suite.py:425: in execute_query_expect_success
      04:26:14.562     result = cls.__execute_query(impalad_client, query, query_options)
      04:26:14.562 common/impala_test_suite.py:511: in __execute_query
      04:26:14.562     return impalad_client.execute(query, user=user)
      04:26:14.562 common/impala_connection.py:160: in execute
      04:26:14.562     return self.__beeswax_client.execute(sql_stmt, user=user)
      04:26:14.562 beeswax/impala_beeswax.py:173: in execute
      04:26:14.562     handle = self.__execute_query(query_string.strip(), user=user)
      04:26:14.562 beeswax/impala_beeswax.py:337: in __execute_query
      04:26:14.562     handle = self.execute_query_async(query_string, user=user)
      04:26:14.562 beeswax/impala_beeswax.py:333: in execute_query_async
      04:26:14.562     return self.__do_rpc(lambda: self.imp_service.query(query,))
      04:26:14.562 beeswax/impala_beeswax.py:465: in __do_rpc
      04:26:14.562     raise ImpalaBeeswaxException(self.__build_error_message(u), u)
      04:26:14.562 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
      04:26:14.562 E    INNER EXCEPTION: <class 'socket.error'>
      04:26:14.562 E    MESSAGE: [Errno 32] Broken pipe
      

        Activity

        Hide
        tarmstrong Tim Armstrong added a comment -

        Looks like a product bug

        DCHECK is:

        F1027 07:11:46.796360 13963 types.h:131] Check failed: scale <= precision (2 vs. 0) 
        

        Backtrace is

        #0  0x000000359a0328e5 in raise () from /lib64/libc.so.6
        #1  0x000000359a0340c5 in abort () from /lib64/libc.so.6
        #2  0x00000000027faca4 in google::DumpStackTraceAndExit() ()
        #3  0x00000000027f410d in google::LogMessage::Fail() ()
        #4  0x00000000027f6a36 in google::LogMessage::SendToLog() ()
        #5  0x00000000027f3c2d in google::LogMessage::Flush() ()
        #6  0x00000000027f74de in google::LogMessageFatal::~LogMessageFatal() ()
        #7  0x000000000181d015 in impala::ColumnType::CreateDecimalType (precision=0, scale=2) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/runtime/types.h:131
        #8  0x00000000019df29d in impala::AvroSchemaToColumnType (schema=@0x17b60c90) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/util/avro-util.cc:118
        #9  0x00000000016977b6 in impala::HdfsAvroScanner::VerifyTypesMatch (this=0xe125020, table_schema=..., file_schema=..., field_name=...) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/exec/hdfs-avro-scanner.cc:389
        #10 0x000000000169649a in impala::HdfsAvroScanner::ResolveSchemas (this=0xe125020, table_root=..., file_root=0x87f1838) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/exec/hdfs-avro-scanner.cc:266
        #11 0x00000000016955fc in impala::HdfsAvroScanner::ParseMetadata (this=0xe125020) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/exec/hdfs-avro-scanner.cc:173
        #12 0x0000000001694b80 in impala::HdfsAvroScanner::ReadFileHeader (this=0xe125020) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/exec/hdfs-avro-scanner.cc:114
        #13 0x000000000179b652 in impala::BaseSequenceScanner::ProcessSplit (this=0xe125020) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/exec/base-sequence-scanner.cc:143
        #14 0x000000000165e16b in impala::HdfsScanNode::ProcessSplit (this=0xffd7c00, filter_ctxs=..., scan_range=0x150bbc80) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:528
        #15 0x000000000165d5b1 in impala::HdfsScanNode::ScannerThread (this=0xffd7c00) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:419
        #16 0x0000000001664897 in boost::_mfi::mf0<void, impala::HdfsScanNode>::operator() (this=0x7ff4c57fdc48, p=0xffd7c00) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/Impala-Toolchain/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:49
        #17 0x0000000001664472 in boost::_bi::list1<boost::_bi::value<impala::HdfsScanNode*> >::operator()<boost::_mfi::mf0<void, impala::HdfsScanNode>, boost::_bi::list0> (this=0x7ff4c57fdc58, f=..., a=...) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind.hpp:253
        #18 0x0000000001663fb1 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, impala::HdfsScanNode>, boost::_bi::list1<boost::_bi::value<impala::HdfsScanNode*> > >::operator() (this=0x7ff4c57fdc48) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20
        #19 0x0000000001663971 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, impala::HdfsScanNode>, boost::_bi::list1<boost::_bi::value<impala::HdfsScanNode*> > >, void>::invoke (function_obj_ptr=...) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/Impala-Toolchain/boost-1.57.0/include/boost/function/function_template.hpp:153
        #20 0x0000000001321c20 in boost::function0<void>::operator() (this=0x7ff4c57fdc40) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/Impala-Toolchain/boost-1.57.0/include/boost/function/function_template.hpp:767
        #21 0x00000000015c9077 in impala::Thread::SuperviseThread (name=..., category=..., functor=..., thread_started=0x7ff4cd007fd0) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/util/thread.cc:317
        #22 0x00000000015d0050 in boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> >::operator()<void (*)(const std::basic_string<char>&, const std::basic_string<char>&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> > &, const std::basic_string<char, std::char_traits<char>, std::allocator<char> > &, boost::function<void()>, impala::Promise<long> *), boost::_bi::list0 &, int) (this=0x160b03c0, f=@0x160b03b8, a=...) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind.hpp:457
        #23 0x00000000015cff93 in boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > >::operator()(void) (this=0x160b03b8) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20
        #24 0x00000000015cfeee in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > > >::run(void) (this=0x160b0200) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/Impala-Toolchain/boost-1.57.0/include/boost/thread/detail/thread.hpp:116
        #25 0x0000000001a2572a in thread_proxy ()
        #26 0x000000359a407851 in start_thread () from /lib64/libpthread.so.0
        #27 0x000000359a0e894d in clone () from /lib64/libc.so.6
        
        Show
        tarmstrong Tim Armstrong added a comment - Looks like a product bug DCHECK is: F1027 07:11:46.796360 13963 types.h:131] Check failed: scale <= precision (2 vs. 0) Backtrace is #0 0x000000359a0328e5 in raise () from /lib64/libc.so.6 #1 0x000000359a0340c5 in abort () from /lib64/libc.so.6 #2 0x00000000027faca4 in google::DumpStackTraceAndExit() () #3 0x00000000027f410d in google::LogMessage::Fail() () #4 0x00000000027f6a36 in google::LogMessage::SendToLog() () #5 0x00000000027f3c2d in google::LogMessage::Flush() () #6 0x00000000027f74de in google::LogMessageFatal::~LogMessageFatal() () #7 0x000000000181d015 in impala::ColumnType::CreateDecimalType (precision=0, scale=2) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/runtime/types.h:131 #8 0x00000000019df29d in impala::AvroSchemaToColumnType (schema=@0x17b60c90) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/util/avro-util.cc:118 #9 0x00000000016977b6 in impala::HdfsAvroScanner::VerifyTypesMatch ( this =0xe125020, table_schema=..., file_schema=..., field_name=...) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/exec/hdfs-avro-scanner.cc:389 #10 0x000000000169649a in impala::HdfsAvroScanner::ResolveSchemas ( this =0xe125020, table_root=..., file_root=0x87f1838) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/exec/hdfs-avro-scanner.cc:266 #11 0x00000000016955fc in impala::HdfsAvroScanner::ParseMetadata ( this =0xe125020) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/exec/hdfs-avro-scanner.cc:173 #12 0x0000000001694b80 in impala::HdfsAvroScanner::ReadFileHeader ( this =0xe125020) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/exec/hdfs-avro-scanner.cc:114 #13 0x000000000179b652 in impala::BaseSequenceScanner::ProcessSplit ( this =0xe125020) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/exec/base-sequence-scanner.cc:143 #14 0x000000000165e16b in impala::HdfsScanNode::ProcessSplit ( this =0xffd7c00, filter_ctxs=..., scan_range=0x150bbc80) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:528 #15 0x000000000165d5b1 in impala::HdfsScanNode::ScannerThread ( this =0xffd7c00) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:419 #16 0x0000000001664897 in boost::_mfi::mf0<void, impala::HdfsScanNode>:: operator () ( this =0x7ff4c57fdc48, p=0xffd7c00) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/Impala-Toolchain/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:49 #17 0x0000000001664472 in boost::_bi::list1<boost::_bi::value<impala::HdfsScanNode*> >:: operator ()<boost::_mfi::mf0<void, impala::HdfsScanNode>, boost::_bi::list0> ( this =0x7ff4c57fdc58, f=..., a=...) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind.hpp:253 #18 0x0000000001663fb1 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, impala::HdfsScanNode>, boost::_bi::list1<boost::_bi::value<impala::HdfsScanNode*> > >:: operator () ( this =0x7ff4c57fdc48) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20 #19 0x0000000001663971 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, impala::HdfsScanNode>, boost::_bi::list1<boost::_bi::value<impala::HdfsScanNode*> > >, void>::invoke (function_obj_ptr=...) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/Impala-Toolchain/boost-1.57.0/include/boost/function/function_template.hpp:153 #20 0x0000000001321c20 in boost::function0<void>:: operator () ( this =0x7ff4c57fdc40) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/Impala-Toolchain/boost-1.57.0/include/boost/function/function_template.hpp:767 #21 0x00000000015c9077 in impala:: Thread ::SuperviseThread (name=..., category=..., functor=..., thread_started=0x7ff4cd007fd0) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/repos/Impala/be/src/util/thread.cc:317 #22 0x00000000015d0050 in boost::_bi::list4<boost::_bi::value<std::basic_string< char , std::char_traits< char >, std::allocator< char > > >, boost::_bi::value<std::basic_string< char , std::char_traits< char >, std::allocator< char > > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise< long int >*> >:: operator ()<void (*)( const std::basic_string< char >&, const std::basic_string< char >&, boost::function<void()>, impala::Promise< long int >*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)( const std::basic_string< char , std::char_traits< char >, std::allocator< char > > &, const std::basic_string< char , std::char_traits< char >, std::allocator< char > > &, boost::function<void()>, impala::Promise< long > *), boost::_bi::list0 &, int ) ( this =0x160b03c0, f=@0x160b03b8, a=...) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind.hpp:457 #23 0x00000000015cff93 in boost::_bi::bind_t<void, void (*)( const std::basic_string< char , std::char_traits< char >, std::allocator< char > >&, const std::basic_string< char , std::char_traits< char >, std::allocator< char > >&, boost::function<void()>, impala::Promise< long int >*), boost::_bi::list4<boost::_bi::value<std::basic_string< char , std::char_traits< char >, std::allocator< char > > >, boost::_bi::value<std::basic_string< char , std::char_traits< char >, std::allocator< char > > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise< long int >*> > >:: operator ()(void) ( this =0x160b03b8) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20 #24 0x00000000015cfeee in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)( const std::basic_string< char , std::char_traits< char >, std::allocator< char > >&, const std::basic_string< char , std::char_traits< char >, std::allocator< char > >&, boost::function<void()>, impala::Promise< long int >*), boost::_bi::list4<boost::_bi::value<std::basic_string< char , std::char_traits< char >, std::allocator< char > > >, boost::_bi::value<std::basic_string< char , std::char_traits< char >, std::allocator< char > > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise< long int >*> > > >::run(void) ( this =0x160b0200) at /data/jenkins/workspace/impala-umbrella-build-and-test-s3/Impala-Toolchain/boost-1.57.0/include/boost/thread/detail/thread.hpp:116 #25 0x0000000001a2572a in thread_proxy () #26 0x000000359a407851 in start_thread () from /lib64/libpthread.so.0 #27 0x000000359a0e894d in clone () from /lib64/libc.so.6
        Hide
        tarmstrong Tim Armstrong added a comment -

        I was able to reproduce this by editing an avro file with a decimal column to have a bogus schema. The schema is just JSON so it's possible to edit the binary file in VIM.

        {"type":"record","name":"some_schema","namespace":"com.howdy","fields":[{"name":"name","type":"string"},{"name":"value","type":{"type":"bytes","logicalType":"decimal","precision":5,"scale":7}}]}
        
        Show
        tarmstrong Tim Armstrong added a comment - I was able to reproduce this by editing an avro file with a decimal column to have a bogus schema. The schema is just JSON so it's possible to edit the binary file in VIM. { "type" : "record" , "name" : "some_schema" , "namespace" : "com.howdy" , "fields" :[{ "name" : "name" , "type" : "string" },{ "name" : "value" , "type" :{ "type" : "bytes" , "logicalType" : "decimal" , "precision" :5, "scale" :7}}]}
        Hide
        tarmstrong Tim Armstrong added a comment -

        IMPALA-4387: validate decimal type in Avro file schema

        This patch prevents an invalid decimal type in an Avro file schema from
        crashing Impala. Most invalid Avro schemas are caught by the frontend,
        but file schemas still need to be validated by the backend.

        After this patch files with bad schemas are skipped.

        Testing:
        This was hit very rarely by the scanner fuzzing. Added a regression test that
        scans a file with a bad schema.

        Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc
        Reviewed-on: http://gerrit.cloudera.org:8080/4876
        Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
        Tested-by: Internal Jenkins

        Show
        tarmstrong Tim Armstrong added a comment - IMPALA-4387 : validate decimal type in Avro file schema This patch prevents an invalid decimal type in an Avro file schema from crashing Impala. Most invalid Avro schemas are caught by the frontend, but file schemas still need to be validated by the backend. After this patch files with bad schemas are skipped. Testing: This was hit very rarely by the scanner fuzzing. Added a regression test that scans a file with a bad schema. Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc Reviewed-on: http://gerrit.cloudera.org:8080/4876 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins

          People

          • Assignee:
            tarmstrong Tim Armstrong
            Reporter:
            jbapple Jim Apple
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development