Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6291

Various crashes and incorrect results on CPUs with AVX512

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
    • Fix Version/s: Impala 2.11.0
    • Component/s: Backend
    • Labels:
    • Environment:
      Ubuntu 16.04, M5.4xlarge

      Description

      M5 and C5 instances use a different hypervisor than M4 and C4. In EC2 C5 and M5 instances, data loading fails. An interesting snippet from the end of an impalad log:

      I1207 04:12:07.922456 19933 coordinator.cc:99] Exec() query_id=944ead2f178cf67e:1755131f00000000 stmt=CREATE TABLE tmp_orders_string AS 
                SELECT STRAIGHT_JOIN
                  o_orderkey, o_custkey, o_orderstatus, o_totalprice, o_orderdate,
                  o_orderpriority, o_clerk, o_shippriority, o_comment,
                  GROUP_CONCAT(
                    CONCAT(
                      CAST(l_partkey AS STRING), '\005',
                      CAST(l_suppkey AS STRING), '\005',
                      CAST(l_linenumber AS STRING), '\005',
                      CAST(l_quantity AS STRING), '\005',
                      CAST(l_extendedprice AS STRING), '\005',
                      CAST(l_discount AS STRING), '\005',
                      CAST(l_tax AS STRING), '\005',
                      CAST(l_returnflag AS STRING), '\005',
                      CAST(l_linestatus AS STRING), '\005',
                      CAST(l_shipdate AS STRING), '\005',
                      CAST(l_commitdate AS STRING), '\005',
                      CAST(l_receiptdate AS STRING), '\005',
                      CAST(l_shipinstruct AS STRING), '\005',
                      CAST(l_shipmode AS STRING), '\005',
                      CAST(l_comment AS STRING)
                    ), '\004'
                  ) AS lineitems_string
                FROM tpch_parquet.lineitem
                INNER JOIN [SHUFFLE] tpch_parquet.orders ON o_orderkey = l_orderkey
                WHERE o_orderkey % 1 = 0
                GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9
      ...
      F1207 04:12:08.972215 19953 partitioned-hash-join-node.cc:291] Check failed: probe_batch_pos_ == probe_batch_->num_rows() || probe_batch_pos_ == -1 
      

      The error log shows:

      F1207 04:12:08.972215 19953 partitioned-hash-join-node.cc:291] Check failed: probe_batch_pos_ == probe_batch_->num_rows() || probe_batch_pos_ == -1 
      *** Check failure stack trace: ***
          @          0x3bdcefd  google::LogMessage::Fail()
          @          0x3bde7a2  google::LogMessage::SendToLog()
          @          0x3bdc8d7  google::LogMessage::Flush()
          @          0x3bdfe9e  google::LogMessageFatal::~LogMessageFatal()
          @          0x28bd4db  impala::PartitionedHashJoinNode::NextProbeRowBatch()
          @          0x28c1741  impala::PartitionedHashJoinNode::GetNext()
          @          0x289f71f  impala::PartitionedAggregationNode::GetRowsStreaming()
          @          0x289d8d5  impala::PartitionedAggregationNode::GetNext()
          @          0x1891d1c  impala::FragmentInstanceState::ExecInternal()
          @          0x188f629  impala::FragmentInstanceState::Exec()
          @          0x1878c0a  impala::QueryState::ExecFInstance()
          @          0x18774cc  _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
          @          0x1879849  _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
          @          0x17c64ba  boost::function0<>::operator()()
          @          0x1abb5a1  impala::Thread::SuperviseThread()
          @          0x1ac412c  boost::_bi::list4<>::operator()<>()
          @          0x1ac406f  boost::_bi::bind_t<>::operator()()
          @          0x1ac4032  boost::detail::thread_data<>::run()
          @          0x2d668ca  thread_proxy
          @     0x7fe9287146ba  start_thread
          @     0x7fe92844a3dd  clone
      Picked up JAVA_TOOL_OPTIONS: -agentlib:jdwp=transport=dt_socket,address=30002,server=y,suspend=n 
      

      To reproduce this, start a M5.4xlarge with 250GB space

      sudo apt-get update
      sudo apt-get install --yes git
      git init ~/Impala
      pushd ~/Impala
      git fetch https://github.com/apache/impala master
      git checkout FETCH_HEAD
      ./bin/bootstrap_development.sh | tee -a $(mktemp -p ~)
      

      You might need to fiddle with the default security group; I'm not sure. You can test on an M4.4xlarge, since the above script should work there.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tarmstrong Tim Armstrong
                Reporter:
                jbapple Jim Apple
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: