Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 1.3.1
-
None
Description
be/src/exec/hdfs-parquet-scanner.cc:736: rows_read < rows_in_file) {
We should detect the case where file_metadata_.num_row doesn't actually equal the number of rows in the file. If abort_on_error is true, this should fail the query, otherwise we should log something via scan_node_>runtime_state()>LogError().
Such handling did not exist before.
Need to decide whether we will read at most as many rows as the metadata or continue reading until there are no more rows in the file.
We will need to add tests with parquet files whose metadata is not correct.