Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5412

Scan returns wrong partition-column values when scanning multiple partitions pointing to the same filesystem location.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.3.0, Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0
    • Fix Version/s: Impala 2.10.0
    • Component/s: Backend

      Description

      A scan against Avro, RCFile or SequenceFile may wrong partition-column values when scanning multiple partitions pointing to the same filesystem location.

      For example, the following setup may return fewer rows than expected, or have incorrect counts.

      // Table contents
      partition_col=1 points to /user/hive/warehouse/shared_dir/000000_0
      partition_col=2 points to /user/hive/warehouse/shared_dir/000000_0
      // Query may return wrong results
      SELECT COUNT(*) FROM t GROUP BY partition_col
      

      In particular, COMPUTE STATS uses the query above to populate the per-partition row counts, so those stored row counts may be incorrect.

      This bug only affects the Avro, RCFile or SequenceFile formats and does not affect Text, Parquet or non-filesystem tables like Kudu.

      The problematic code can be found in hdfs-scan-node-base.h:

        /// Scanner specific per file metadata (e.g. header information) and associated lock.
        boost::mutex metadata_lock_;
        std::map<std::string, void*> per_file_metadata_;
      

      The same file path could belong to multiple partitions, so a scanner may pick up the wrong per-file metadata which includes the partition values.
      Note that the key in this map is the full file path, no just the file name, so this bug is specific to partitions pointing to the same location.

        Attachments

          Activity

            People

            • Assignee:
              gaborkaszab Gabor Kaszab
              Reporter:
              alex.behm Alexander Behm
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: