Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10233

Hit DCHECK in DmlExecState::AddPartition when inserting to a partitioned table with zorder

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 4.0.0
    • Impala 4.0.0
    • Backend

    Description

      Hit the DCHECK when inserting to a partitioned parquet table with zorder. I'm on master branch (commit=b8a2b75).

      F1012 15:04:27.726274  3868 dml-exec-state.cc:432] a6479cc4725101fd:b86db2a100000003] Check failed: per_partition_status_.find(name) == per_partition_status_.end() 
      *** Check failure stack trace: *** 
          @          0x51ff3cc  google::LogMessage::Fail()
          @          0x5200cbc  google::LogMessage::SendToLog()
          @          0x51fed2a  google::LogMessage::Flush()
          @          0x5202928  google::LogMessageFatal::~LogMessageFatal()
          @          0x234ba18  impala::DmlExecState::AddPartition()
          @          0x2817786  impala::HdfsTableSink::GetOutputPartition()
          @          0x2813151  impala::HdfsTableSink::WriteClusteredRowBatch()
          @          0x28156c4  impala::HdfsTableSink::Send()
          @          0x23139dd  impala::FragmentInstanceState::ExecInternal()
          @          0x230fe10  impala::FragmentInstanceState::Exec()
          @          0x227bb79  impala::QueryState::ExecFInstance()
          @          0x2279f7b  _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
          @          0x227e2c2  _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
          @          0x2137699  boost::function0<>::operator()()
          @          0x2715d7d  impala::Thread::SuperviseThread()
          @          0x271dd1a  boost::_bi::list5<>::operator()<>()
          @          0x271dc3e  boost::_bi::bind_t<>::operator()()
          @          0x271dbff  boost::detail::thread_data<>::run()
          @          0x3f05f01  thread_proxy
          @     0x7fb18bebb6b9  start_thread
          @     0x7fb188a474dc  clone 

      It seems the zorder sort node doesn't keep the rows sorted by partition keys. Thus violates the assumption of HdfsTableSink::WriteClusteredRowBatch() that input must be ordered by the partition key expressions. So a partition key was deleted and then inserted again to the partition_keys_to_output_partitions_ map.

        /// Maps all rows in 'batch' to partitions and appends them to their temporary Hdfs
        /// files. The input must be ordered by the partition key expressions.
        Status WriteClusteredRowBatch(RuntimeState* state, RowBatch* batch) WARN_UNUSED_RESULT;
      

      The key got removed here: https://github.com/apache/impala/blob/b8a2b754669eb7f8d164e8112e594ac413e436ef/be/src/exec/hdfs-table-sink.cc#L334 when processing a new partition key.
      It got reinserted here: https://github.com/apache/impala/blob/b8a2b754669eb7f8d164e8112e594ac413e436ef/be/src/exec/hdfs-table-sink.cc#L590 so hit the DCHECK.

      Attachments

        Issue Links

          Activity

            People

              stigahuang Quanlong Huang
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: