Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5008

AddressSanitizer: heap-buffer-overflow in ParquetPlainEncoder

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:

      Description

      Some E2E tests started failing in ASAN build due to heap buffer overflow:

      17:57:56 FAIL query_test/test_exprs.py::TestExprs::()::test_exprs[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none | enable_expr_rewrites: 1]
      17:57:56 FAIL query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::()::test_low_mem_limit_q14[mem_limit: 700 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none]
      17:57:56 FAIL query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::()::test_low_mem_limit_q22[mem_limit: 450 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none]
      17:57:56 FAIL query_test/test_join_queries.py::TestJoinQueries::()::test_basic_joins[batch_size: 0 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none]
      17:57:56 FAIL query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::()::test_low_mem_limit_q22[mem_limit: 700 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none]
      17:57:56 FAIL query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::()::test_low_mem_limit_q14[mem_limit: 980 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none]
      17:57:56 FAIL query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::()::test_low_mem_limit_q15[mem_limit: 20 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none]
      17:57:56 ERROR query_test/test_exprs.py::TestExprLimits::()::test_expr_child_limit[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none]
      17:57:56 ERROR query_test/test_exprs.py::TestExprLimits::()::test_expr_child_limit[exec_option: {'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none]
      17:57:56 ERROR query_test/test_exprs.py::TestExprLimits::()::test_expr_depth_limit[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none]
      17:57:56 ==================================== ERRORS ====================================
      17:57:56  ERROR at setup of TestExprLimits.test_expr_child_limit[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] 
      17:57:56 [gw2] linux2 -- Python 2.6.6 /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/bin/../infra/python/env/bin/python
      17:57:56 common/impala_test_suite.py:122: in setup_class
      17:57:56     cls.client = cls.create_impala_client(IMPALAD)
      17:57:56 common/impala_test_suite.py:146: in create_impala_client
      17:57:56     client.connect()
      17:57:56 common/impala_connection.py:147: in connect
      17:57:56     self.__beeswax_client.connect()
      17:57:56 beeswax/impala_beeswax.py:148: in connect
      17:57:56     raise ImpalaBeeswaxException(self.__build_error_message(e), e)
      17:57:56 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
      17:57:56 E    INNER EXCEPTION: <class 'thrift.transport.TTransport.TTransportException'>
      17:57:56 E    MESSAGE: Could not connect to localhost:21000
      17:57:56 ---------------------------- Captured stderr setup -----------------------------
      

      The output from address sanitizer:

      ==25973==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200259fe93 at pc 0x00000181c427 bp 0x7fb4bf554e50 sp 0x7fb4bf554e48
      WRITE of size 4 at 0x60200259fe93 thread T37370
      ==25973==AddressSanitizer: while reporting a bug found another one. Ignoring.
          #0 0x181c426 in int impala::ParquetPlainEncoder::Decode<int>(unsigned char*, unsigned char const*, int, int*) /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/parquet-common.h:185:5
          #1 0x184bd37 in impala::ColumnStats<int>::DecodeValueFromThrift(std::string const&, int*) /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/parquet-column-stats.inline.h:97:7
          #2 0x184b680 in impala::ColumnStats<int>::ReadFromThrift(parquet::Statistics const&, impala::ColumnStatsBase::StatsField const&, void*) /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/parquet-column-stats.inline.h:31:14
          #3 0x184b241 in impala::ColumnStatsBase::ReadFromThrift(parquet::Statistics const&, impala::ColumnType const&, impala::ColumnStatsBase::StatsField const&, void*) /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/parquet-column-stats.cc:28:14
          #4 0x17bdeea in impala::HdfsParquetScanner::EvaluateStatsConjuncts(parquet::RowGroup const&, bool*) /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/hdfs-parquet-scanner.cc:512:20
          #5 0x17bc40b in impala::HdfsParquetScanner::NextRowGroup() /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/hdfs-parquet-scanner.cc:589:31
          #6 0x17b94bb in impala::HdfsParquetScanner::GetNextInternal(impala::RowBatch*) /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/hdfs-parquet-scanner.cc:446:31
          #7 0x17b8a64 in impala::HdfsParquetScanner::ProcessSplit() /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/hdfs-parquet-scanner.cc:392:31
          #8 0x174a2d5 in impala::HdfsScanNode::ProcessSplit(std::vector<impala::FilterContext, std::allocator<impala::FilterContext> > const&, impala::DiskIoMgr::ScanRange*) /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/hdfs-scan-node.cc:539:12
          #9 0x17494a1 in impala::HdfsScanNode::ScannerThread() /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/hdfs-scan-node.cc:430:16
          #10 0x17533c7 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, impala::HdfsScanNode>, boost::_bi::list1<boost::_bi::value<impala::HdfsScanNode*> > >::operator()() /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind_template.hpp:2
      0:16
          #11 0x12c6442 in boost::function0<void>::operator()() const /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/function/function_template.hpp:766:14
          #12 0x1685e95 in impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*) /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/util/thread.cc:317:3
          #13 0x168ec6a in void boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> >::operator()<void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Pr
      omise<long>*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list0&, int) /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind
      .hpp:457:9
          #14 0x168eaf7 in boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > >::operator()() /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind_template.hpp:20:16
          #15 0x1d08999 in thread_proxy (/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/build/debug/service/impalad+0x1d08999)
          #16 0x3989407850 in start_thread (/lib64/libpthread.so.0+0x3989407850)
          #17 0x39890e894c in clone (/lib64/libc.so.6+0x39890e894c)
      
      0x60200259fe93 is located 1 bytes to the right of 2-byte region [0x60200259fe90,0x60200259fe92)
      allocated by thread T37370 here:
          #0 0xfcb0e8 in __interceptor_malloc /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/llvm/llvm-3.8.0.src-p1/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:52
          #1 0x17c2deb in impala::ScopedBuffer::TryAllocate(long) /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/scoped-buffer.h:40:42
          #2 0x17b41b7 in impala::HdfsParquetScanner::Open(impala::ScannerContext*) /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/hdfs-parquet-scanner.cc:198:10
          #3 0x175ed62 in impala::HdfsScanNodeBase::CreateAndOpenScanner(impala::HdfsPartitionDescriptor*, impala::ScannerContext*, boost::scoped_ptr<impala::HdfsScanner>*) /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/hdfs-scan-node-base.cc:660:14
          #4 0x174a011 in impala::HdfsScanNode::ProcessSplit(std::vector<impala::FilterContext, std::allocator<impala::FilterContext> > const&, impala::DiskIoMgr::ScanRange*) /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/hdfs-scan-node.cc:524:19
          #5 0x17494a1 in impala::HdfsScanNode::ScannerThread() /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/hdfs-scan-node.cc:430:16
          #6 0x17533c7 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, impala::HdfsScanNode>, boost::_bi::list1<boost::_bi::value<impala::HdfsScanNode*> > >::operator()() /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind_template.hpp:20:16
          #7 0x12c6442 in boost::function0<void>::operator()() const /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/function/function_template.hpp:766:14
          #8 0x1685e95 in impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*) /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/util/thread.cc:317:3
          #9 0x168ec6a in void boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> >::operator()<void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list0&, int) /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:457:9
          #10 0x168eaf7 in boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > >::operator()() /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind_template.hpp:20:16
          #11 0x1d08999 in thread_proxy (/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/build/debug/service/impalad+0x1d08999)
      
      Thread T37370 created by T37356 here:
          #0 0xf36399 in __interceptor_pthread_create /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/llvm/llvm-3.8.0.src-p1/projects/compiler-rt/lib/asan/asan_interceptors.cc:238
          #1 0x1d07d79 in boost::thread::start_thread_noexcept() (/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/build/debug/service/impalad+0x1d07d79)
      
      Thread T37356 created by T263 here:
          #0 0xf36399 in __interceptor_pthread_create /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/llvm/llvm-3.8.0.src-p1/projects/compiler-rt/lib/asan/asan_interceptors.cc:238
          #1 0x1d07d79 in boost::thread::start_thread_noexcept() (/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/build/debug/service/impalad+0x1d07d79)
      
      Thread T263 created by T74 here:
          #0 0xf36399 in __interceptor_pthread_create /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/llvm/llvm-3.8.0.src-p1/projects/compiler-rt/lib/asan/asan_interceptors.cc:238
          #1 0x1d07d79 in boost::thread::start_thread_noexcept() (/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/build/debug/service/impalad+0x1d07d79)
      
      Thread T74 created by T73 here:
          #0 0xf36399 in __interceptor_pthread_create /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/llvm/llvm-3.8.0.src-p1/projects/compiler-rt/lib/asan/asan_interceptors.cc:238
          #1 0x1d07d79 in boost::thread::start_thread_noexcept() (/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/build/debug/service/impalad+0x1d07d79)
      
      Thread T73 created by T0 here:
          #0 0xf36399 in __interceptor_pthread_create /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/llvm/llvm-3.8.0.src-p1/projects/compiler-rt/lib/asan/asan_interceptors.cc:238
          #1 0x1d07d79 in boost::thread::start_thread_noexcept() (/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/build/debug/service/impalad+0x1d07d79)
      
      SUMMARY: AddressSanitizer: heap-buffer-overflow /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/parquet-common.h:185:5 in int impala::ParquetPlainEncoder::Decode<int>(unsigned char*, unsigned char const*, int, int*)
      Shadow bytes around the buggy address:
        0x0c04804abf80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        0x0c04804abf90: fa fa fa fa fa fa fd fd fa fa fa fa fa fa fa fa
        0x0c04804abfa0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        0x0c04804abfb0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fd fd
        0x0c04804abfc0: fa fa fd fd fa fa fa fa fa fa fd fa fa fa fa fa
      =>0x0c04804abfd0: fa fa[02]fa fa fa fd fd fa fa fa fa fa fa fd fd
        0x0c04804abfe0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fd fd
        0x0c04804abff0: fa fa fa fa fa fa fa fa fa fa fd fd fa fa fa fa
      

        Activity

        Hide
        kwho Michael Ho added a comment -

        Lars Volker, could it be related to some recent changes in parquet ?

        Show
        kwho Michael Ho added a comment - Lars Volker , could it be related to some recent changes in parquet ?
        Hide
        lv Lars Volker added a comment -

        Michael Ho - Yes, that looks suspicious. I'll have a look.

        Show
        lv Lars Volker added a comment - Michael Ho - Yes, that looks suspicious. I'll have a look.
        Hide
        lv Lars Volker added a comment -

        IMPALA-5008: Fix reading stats for TINYINT and SMALLINT

        TINYINT and SMALLINT types use 1 and 2 byte slots respectively. However,
        statistics for the corresponding INT_8 and INT_16 Parquet types are
        encoded using 4 bytes. When reading back we were missing a conversion
        to the smaller types, thus overwriting the memory behind them. This was
        caught by the address sanitizer. The fix is to perform the necessary
        conversion.

        Change-Id: I9b10508db53747e7b08c8bd9a69c763b82135a78
        Reviewed-on: http://gerrit.cloudera.org:8080/6226
        Reviewed-by: Lars Volker <lv@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        lv Lars Volker added a comment - IMPALA-5008 : Fix reading stats for TINYINT and SMALLINT TINYINT and SMALLINT types use 1 and 2 byte slots respectively. However, statistics for the corresponding INT_8 and INT_16 Parquet types are encoded using 4 bytes. When reading back we were missing a conversion to the smaller types, thus overwriting the memory behind them. This was caught by the address sanitizer. The fix is to perform the necessary conversion. Change-Id: I9b10508db53747e7b08c8bd9a69c763b82135a78 Reviewed-on: http://gerrit.cloudera.org:8080/6226 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            lv Lars Volker
            Reporter:
            kwho Michael Ho
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development