Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1022

Handle cases where in parquet the expected number of rows per metadata is not equal to the actual number of rows in the file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 1.3.1
    • Impala 1.4
    • None

    Description

      be/src/exec/hdfs-parquet-scanner.cc:736: rows_read < rows_in_file) {
      We should detect the case where file_metadata_.num_row doesn't actually equal the number of rows in the file. If abort_on_error is true, this should fail the query, otherwise we should log something via scan_node_>runtime_state()>LogError().
      Such handling did not exist before.
      Need to decide whether we will read at most as many rows as the metadata or continue reading until there are no more rows in the file.
      We will need to add tests with parquet files whose metadata is not correct.

      Attachments

        Activity

          People

            ippokratis Ippokratis Pandis
            ippokratis Ippokratis Pandis
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: