Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5021

COMPUTE STATS hang while RowsRead of one SCAN fragment winds down

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:

      Description

      I created a table copied from another table, but sorted, partially using Impala and partially using Hive. I invalidated the table metadata. When I run COMPUTE STATS on the table the SELECT COUNT query runs for awhile and then stops making progress. After that point, comparing snapshots of the profile reveals that only one SCAN fragment seems to be active, but its RowsRead counter is moving backwards, and even goes negative. When I kill the query and try again, the problem reproduces, hanging at the same point in the scan progress.

        Activity

        Hide
        mmulder Matthew Mulder added a comment -

        I copied the table.

        create table partsupp_sorted_2 stored as parquet as select * from partsupp_sorted;

        COMPUTE STATS succeeded on the new table.

        Show
        mmulder Matthew Mulder added a comment - I copied the table. create table partsupp_sorted_2 stored as parquet as select * from partsupp_sorted; COMPUTE STATS succeeded on the new table.
        Hide
        mmokhtar Mostafa Mokhtar added a comment -

        https://github.com/apache/incubator-impala/blob/master/be/src/exec/hdfs-parquet-scanner.cc#L409

        Seems likes condition is never met, HdfsParquetScanner::GetNextInternal and the DCHECK should be firing.

            if (row_group_rows_read_ == file_metadata_.num_rows) {
              eos_ = true;
              return Status::OK();
            }
            assemble_rows_timer_.Start();
            DCHECK_LE(row_group_rows_read_, file_metadata_.num_rows);
        
        Show
        mmokhtar Mostafa Mokhtar added a comment - https://github.com/apache/incubator-impala/blob/master/be/src/exec/hdfs-parquet-scanner.cc#L409 Seems likes condition is never met, HdfsParquetScanner::GetNextInternal and the DCHECK should be firing. if (row_group_rows_read_ == file_metadata_.num_rows) { eos_ = true ; return Status::OK(); } assemble_rows_timer_.Start(); DCHECK_LE(row_group_rows_read_, file_metadata_.num_rows);
        Hide
        alex.behm Alexander Behm added a comment -

        The problem is in Impala. Here's the problematic code from HdfsParquetScanner::GetNextInterna():

          if (scan_node_->IsZeroSlotTableScan()) {
            // There are no materialized slots, e.g. count(*) over the table.  We can serve
            // this query from just the file metadata. We don't need to read the column data.
            if (row_group_rows_read_ == file_metadata_.num_rows) {
              eos_ = true;
              return Status::OK();
            }
            assemble_rows_timer_.Start();
            DCHECK_LE(row_group_rows_read_, file_metadata_.num_rows);
            int rows_remaining = file_metadata_.num_rows - row_group_rows_read_; <--- here file_metadata_.num_rows is >> than MAX_INT
            int max_tuples = min<int64_t>(row_batch->capacity(), rows_remaining);
            TupleRow* current_row = row_batch->GetRow(row_batch->AddRow());
            int num_to_commit = WriteTemplateTuples(current_row, max_tuples);
            Status status = CommitRows(row_batch, num_to_commit);
            assemble_rows_timer_.Stop();
            RETURN_IF_ERROR(status);
            row_group_rows_read_ += num_to_commit;
            COUNTER_ADD(scan_node_->rows_read_counter(), row_group_rows_read_);
            return Status::OK();
          }
        
        Show
        alex.behm Alexander Behm added a comment - The problem is in Impala. Here's the problematic code from HdfsParquetScanner::GetNextInterna(): if (scan_node_->IsZeroSlotTableScan()) { // There are no materialized slots, e.g. count(*) over the table. We can serve // this query from just the file metadata. We don't need to read the column data. if (row_group_rows_read_ == file_metadata_.num_rows) { eos_ = true ; return Status::OK(); } assemble_rows_timer_.Start(); DCHECK_LE(row_group_rows_read_, file_metadata_.num_rows); int rows_remaining = file_metadata_.num_rows - row_group_rows_read_; <--- here file_metadata_.num_rows is >> than MAX_INT int max_tuples = min<int64_t>(row_batch->capacity(), rows_remaining); TupleRow* current_row = row_batch->GetRow(row_batch->AddRow()); int num_to_commit = WriteTemplateTuples(current_row, max_tuples); Status status = CommitRows(row_batch, num_to_commit); assemble_rows_timer_.Stop(); RETURN_IF_ERROR(status); row_group_rows_read_ += num_to_commit; COUNTER_ADD(scan_node_->rows_read_counter(), row_group_rows_read_); return Status::OK(); }
        Hide
        alex.behm Alexander Behm added a comment -

        commit d3cc23e569eb44a3b9b823bd99da783b8356c6d4
        Author: Alex Behm <alex.behm@cloudera.com>
        Date: Mon Mar 6 23:50:59 2017 -0800

        IMPALA-5021: Fix count remaining rows overflow in Parquet.

        Zero-slot scans of Parquet files that have num_rows > MAX_INT32
        in the footer metadata used to run forever due to an overflow when
        calculating the remaining number of rows to process.

        Testing:

        • Added a regression test using a file with num_rows = 2*MAX_INT32.
        • Locally ran test_scanners.py which succeeded.
        • Private core/hdfs run succeeded

        Change-Id: Ib9f8a6b83f8f621451d5977423ef81a6e4b124bd
        Reviewed-on: http://gerrit.cloudera.org:8080/6286
        Reviewed-by: Alex Behm <alex.behm@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        alex.behm Alexander Behm added a comment - commit d3cc23e569eb44a3b9b823bd99da783b8356c6d4 Author: Alex Behm <alex.behm@cloudera.com> Date: Mon Mar 6 23:50:59 2017 -0800 IMPALA-5021 : Fix count remaining rows overflow in Parquet. Zero-slot scans of Parquet files that have num_rows > MAX_INT32 in the footer metadata used to run forever due to an overflow when calculating the remaining number of rows to process. Testing: Added a regression test using a file with num_rows = 2*MAX_INT32. Locally ran test_scanners.py which succeeded. Private core/hdfs run succeeded Change-Id: Ib9f8a6b83f8f621451d5977423ef81a6e4b124bd Reviewed-on: http://gerrit.cloudera.org:8080/6286 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            alex.behm Alexander Behm
            Reporter:
            mmulder Matthew Mulder
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development