A scan against Avro, RCFile or SequenceFile may wrong partition-column values when scanning multiple partitions pointing to the same filesystem location.
For example, the following setup may return fewer rows than expected, or have incorrect counts.
In particular, COMPUTE STATS uses the query above to populate the per-partition row counts, so those stored row counts may be incorrect.
This bug only affects the Avro, RCFile or SequenceFile formats and does not affect Text, Parquet or non-filesystem tables like Kudu.
The problematic code can be found in hdfs-scan-node-base.h:
The same file path could belong to multiple partitions, so a scanner may pick up the wrong per-file metadata which includes the partition values.
Note that the key in this map is the full file path, no just the file name, so this bug is specific to partitions pointing to the same location.