Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5055

Jenkins test run hit DCHECK in parquet-column-readers.cc

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:

      Description

      http://jenkins.impala.io:8080/job/ubuntu-14.04-from-scratch/992/

      9 01:52:17.112439 34987 parquet-column-readers.cc:807] Check failed: status.ok() 
      *** Check failure stack trace: ***
          @     0x7f451d3d93ad  google::LogMessage::Fail()
          @     0x7f451d3dbcd6  google::LogMessage::SendToLog()
          @     0x7f451d3d8ecd  google::LogMessage::Flush()
          @     0x7f451d3dc77e  google::LogMessageFatal::~LogMessageFatal()
          @     0x7f451fae6061  impala::BaseScalarColumnReader::ReadPageHeader()
          @     0x7f451fae7b30  impala::BaseScalarColumnReader::ReadDataPage()
          @     0x7f451fae8e1b  impala::BaseScalarColumnReader::NextPage()
          @     0x7f451faec75e  impala::BaseScalarColumnReader::NextLevels<>()
          @     0x7f451faeb5ec  impala::BaseScalarColumnReader::NextLevels()
          @     0x7f451fa7f76f  impala::HdfsParquetScanner::NextRowGroup()
          @     0x7f451fa7ded1  impala::HdfsParquetScanner::GetNextInternal()
          @     0x7f451fa7d283  impala::HdfsParquetScanner::ProcessSplit()
          @     0x7f451fa08fb1  impala::HdfsScanNode::ProcessSplit()
          @     0x7f451fa083f8  impala::HdfsScanNode::ScannerThread()
          @     0x7f451fa10661  boost::_mfi::mf0<>::operator()()
          @     0x7f451fa10218  boost::_bi::list1<>::operator()<>()
          @     0x7f451fa0fd11  boost::_bi::bind_t<>::operator()()
          @     0x7f451fa0f611  boost::detail::function::void_function_obj_invoker0<>::invoke()
          @     0x7f45207018f0  boost::function0<>::operator()()
          @     0x7f45206fef83  impala::Thread::SuperviseThread()
          @     0x7f452070847c  boost::_bi::list4<>::operator()<>()
          @     0x7f45207083bf  boost::_bi::bind_t<>::operator()()
          @     0x7f4520708382  boost::detail::thread_data<>::run()
          @           0x87ea3a  thread_proxy
          @     0x7f451a917184  start_thread
          @     0x7f451a64437d  (unknown)
      :230)
      	at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:256)
      	at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
      	at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source)
      	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1279)
      	... 10 more
      readDirect: FSDataInputStream#read error:
      java.lang.UnsupportedOperationException: Byte-buffer read unsupported by input stream
      	at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150)
      

      Looks like the recent parquet dictionary filtering commit added that DCHECK, so I'm assigning to Joe.

      The first set of tests that failed was:

      01:54:17 [gw0] FAILED query_test/test_scanners.py::TestScannersAllTableFormatsWithLimit::test_limit[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: kudu/none] 
      01:54:17 query_test/test_scanners.py::TestUnmatchedSchema::test_unmatched_schema[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] 
      01:54:17 [gw4] FAILED query_test/test_tpch_nested_queries.py::TestTpchNestedQuery::test_tpch_q9[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      01:54:17 [gw2] FAILED query_test/test_scanners.py::TestScannersAllTableFormatsWithLimit::test_limit[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: seq/snap/block] 
      01:54:17 query_test/test_tpch_nested_queries.py::TestTpchNestedQuery::test_tpch_q10[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      01:54:17 [gw7] FAILED query_test/test_tpch_nested_queries.py::TestTpchNestedQuery::test_tpch_q18[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      01:54:17 query_test/test_tpch_nested_queries.py::TestTpchNestedQuery::test_tpch_q19[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      01:54:17 [gw6] FAILED query_test/test_tpch_nested_queries.py::TestTpchNestedQuery::test_tpch_q6[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      01:54:17 query_test/test_tpch_nested_queries.py::TestTpchNestedQuery::test_tpch_q7[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      01:54:17 [gw3] FAILED query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filters[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      01:54:17 query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filters_phj_only[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      01:54:17 [gw5] FAILED query_test/test_sort.py::TestQueryFullSort::test_multiple_mem_limits[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      01:54:17 query_test/test_tpcds_queries.py::TestTpcdsQuery::test_tpcds_q88[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      01:54:17 [gw1] ERROR query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_alltypes[exec_option: {'mem_limit': '512m', 'abort_on_error': False, 'num_nodes': 0} | table_format: parquet/none] 
      01:54:17 query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_alltypes[exec_option: {'mem_limit': '512m', 'abort_on_error': False, 'num_nodes': 0} | table_format: avro/snap/block] 
      01:54:17 [gw0] ERROR query_test/test_scanners.py::TestUnmatchedSchema::test_unmatched_schema[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] 
      01:54:17 query_test/test_scanners.py::TestUnmatchedSchema::test_unmatched_schema[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: seq/snap/block] 
      01:54:17 [gw5] ERROR query_test/test_tpcds_queries.py::TestTpcdsQuery::test_tpcds_q88[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      01:54:17 query_test/test_tpcds_queries.py::TestTpcdsQuery::test_tpcds_q89[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      01:54:17 [gw4] FAILED query_test/test_tpch_nested_queries.py::TestTpchNestedQuery::test_tpch_q10[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      01:54:17 query_test/test_tpch_nested_queries.py::TestTpchNestedQuery::test_tpch_q11[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      

      My guess is that TestScannersFuzzing triggered something

        Activity

        Hide
        joemcdonnell Joe McDonnell added a comment -

        commit 244ec22d8e8b2f108f723d49dc95a016a74d6d6f
        Author: Joe McDonnell <joemcdonnell@cloudera.com>
        Date: Thu Mar 9 09:12:12 2017 -0800

        IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader()

        GetBytes only sets status in the case of an error. This means that
        ReadPageHeader needs to initialize the status variable so that the
        status.ok() check is accurate after the GetBytes call.

        I verified that the other uses of status are ok. Most do not check
        status.ok() directly, but rely on the return value of the function
        setting status.

        Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e
        Reviewed-on: http://gerrit.cloudera.org:8080/6328
        Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        joemcdonnell Joe McDonnell added a comment - commit 244ec22d8e8b2f108f723d49dc95a016a74d6d6f Author: Joe McDonnell <joemcdonnell@cloudera.com> Date: Thu Mar 9 09:12:12 2017 -0800 IMPALA-5055 : Fix DCHECK in parquet-column-readers.cc ReadPageHeader() GetBytes only sets status in the case of an error. This means that ReadPageHeader needs to initialize the status variable so that the status.ok() check is accurate after the GetBytes call. I verified that the other uses of status are ok. Most do not check status.ok() directly, but rely on the return value of the function setting status. Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e Reviewed-on: http://gerrit.cloudera.org:8080/6328 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            joemcdonnell Joe McDonnell
            Reporter:
            tarmstrong Tim Armstrong
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development