Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5861

HdfsParquetScanner::GetNextInternal() IsZeroSlotTableScan() case double counts

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.10.0
    • Fix Version/s: Impala 3.2.0
    • Component/s: Backend
    • Labels:
      None

      Description

      It appears that this code is double counting into rows_read_counter(), since row_group_rows_read_ is already accumulating:

      HdfsParquetScanner::GetNextInternal()
        } else if (scan_node_->IsZeroSlotTableScan()) {
          // There are no materialized slots and we are not optimizing count(*), e.g.
          // "select 1 from alltypes". We can serve this query from just the file metadata.
          // We don't need to read the column data.
          if (row_group_rows_read_ == file_metadata_.num_rows) {
            eos_ = true;
            return Status::OK();
          }
          assemble_rows_timer_.Start();
          DCHECK_LE(row_group_rows_read_, file_metadata_.num_rows);
          int64_t rows_remaining = file_metadata_.num_rows - row_group_rows_read_;
          int max_tuples = min<int64_t>(row_batch->capacity(), rows_remaining);
          TupleRow* current_row = row_batch->GetRow(row_batch->AddRow());
          int num_to_commit = WriteTemplateTuples(current_row, max_tuples);
          Status status = CommitRows(row_batch, num_to_commit);
          assemble_rows_timer_.Stop();
          RETURN_IF_ERROR(status);
          row_group_rows_read_ += num_to_commit;
          COUNTER_ADD(scan_node_->rows_read_counter(), row_group_rows_read_);  <======
          return Status::OK();
        }
      

      Repro in impala-shell:

      set batch_size=16; set num_nodes=1; select count(*) from functional.alltypesmixedformat; profile
      ....
                 - RowsRead: 3.94K (3936)
                 - RowsReturned: 1.20K (1200)
      

        Attachments

          Activity

            People

            • Assignee:
              tarmstrong Tim Armstrong
              Reporter:
              dhecht Dan Hecht
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: