Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11751

Crash in processing partition columns of Avro table with MT_DOP>1

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1
    • Impala 4.1.2, Impala 4.3.0
    • Backend
    • None

    Description

      We saw a crash in a query that aggregates the string partition column of an Avro table with MT_DOP setting to 4. The query is quite simple:

      create external table date_str_avro (v int)
        partitioned by (date_str string)
        stored as avro;
      -- Import files attached in this JIRA, repeat the following query.
      -- It will crash in 10 runs.
      set MT_DOP=2;
      select count(*), date_str from date_str_avro group by date_str;
      

      It needs specifit data set to reproduce the crash. Files and steps given later.
      Disable codegen (by "set disable_codegen=1") and reproduce the crash. The stacktrace is

      Crash reason:  SIGSEGV /SEGV_MAPERR
      Crash address: 0x0
      Process uptime: not available
      
      Thread 512 (crashed)
       0  impalad!impala::HashTableCtx::Hash(void const*, int, unsigned int) const [sse-util.h : 227 + 0x2]
       1  impalad!impala::HashTableCtx::HashVariableLenRow(unsigned char const*, unsigned char const*) const [hash-table.cc : 306 + 0x8]
       2  impalad!impala::HashTableCtx::HashRow(unsigned char const*, unsigned char const*) const [hash-table.cc : 255 + 0x5]
       3  impalad!void impala::GroupingAggregator::EvalAndHashPrefetchGroup<false>(impala::RowBatch*, int, impala::TPrefetchMode::type, impala::HashTableCtx*) [hash-table.inline.h : 39 + 0xe]
       4  impalad!impala::GroupingAggregator::AddBatchStreamingImpl(int, bool, impala::TPrefetchMode::type, impala::RowBatch*, impala::RowBatch*, impala::HashTableCtx*, int*) [grouping-aggregator-ir.cc : 185 + 0x1c]
       5  impalad!impala::GroupingAggregator::AddBatchStreaming(impala::RuntimeState*, impala::RowBatch*, impala::RowBatch*, bool*) [grouping-aggregator.cc : 520 + 0x2d]
       6  impalad!impala::StreamingAggregationNode::GetRowsStreaming(impala::RuntimeState*, impala::RowBatch*) [streaming-aggregation-node.cc : 120 + 0x3]
       7  impalad!impala::StreamingAggregationNode::GetNext(impala::RuntimeState*, impala::RowBatch*, bool*) [streaming-aggregation-node.cc : 77 + 0x19]
       8  impalad!impala::FragmentInstanceState::ExecInternal() [fragment-instance-state.cc : 446 + 0x3]
       9  impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc : 104 + 0xb]
      10  impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) [query-state.cc : 950 + 0x19]
      11  impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*) [function_template.hpp : 763 + 0x3]
      12  impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*), boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::ThreadDebugInfo*>, boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > >::run() [bind.hpp : 531 + 0x3]
      13  impalad!thread_proxy + 0x67
      14  libpthread.so.0 + 0x76ba
      15  libc.so.6 + 0x1074dd
      

      This is reproduced on commit 2733d039a of the master branch.

      Reproducing the bug requires the following conditions:

      • Partitioned Avro table
      • MT_DOP is set to be larger than 1
      • Query needs follow-up processing (e.g. GROUP BY, JOIN, etc.) on the partition values or default values of missing fields in the files.
      • num of files(blocks) > num of impalads. So multiple scan fragment instances run on one impalad.
      • Some scan node instances finish earlier than others, e.g. when there are both small files and large files.

      Steps to import the attached Avro data files

      $ tar zxf date_str_avro.tar.gz
      $ hdfs dfs -put date_str_avro/* hdfs_location_of_table_dir
      impala-shell> alter table date_str_avro recover partitions;
      

      RCA
      This is a bug introduces by IMPALA-9655.

      Each avro file requires at least two scan ranges. The initial range reads the file header and initializes the template tuple. The initial scanner then issues follow-up scan ranges to read the file content. Mem of the template tuple is transferred to the ScanNode. Note that partition values are materialized into the template tuple.

      After IMPALA-9655, the ranges of a file could be scheduled to different ScanNode instances when MT_DOP > 1. In the following sequence, there is an illegal mem access of "heap-use-after-free", which could cause a crash.

      t0:
      Scanner of ScanNode-1 reads header of a large avro file.
      Scanner of ScanNode-2 reads header of a small avro file.
      Varlen memory of the template_tuple transfers to the corresponding ScanNode.
      t1:
      Scanner of ScanNode-1 reads content of the small avro file.
      Scanner of ScanNode-2 reads content of the large avro file.
      Scanner will reuse the template_tuple created by the header scanners [1]. So RowBatch produced by ScanNode-2 actually reference mem owned by ScanNode-1.
      t2:
      ScanNode-1 finishes first and closes (assuming no more files to read).
      Downstream consumer of ScanNode-2 will crash if accessing the partition string values.

      [1] https://github.com/apache/impala/blob/2733d039ad4a830a1ea34c1a75d2b666788e39a9/be/src/exec/avro/hdfs-avro-scanner.cc#L478

      Attachments

        1. heap-use-after-free-report2.txt
          39 kB
          Quanlong Huang
        2. heap-use-after-free-report1.txt
          39 kB
          Quanlong Huang
        3. date_str_avro.tar.gz
          11 kB
          Quanlong Huang

        Issue Links

          Activity

            People

              stigahuang Quanlong Huang
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: