The Avro changes in
IMPALA-3905 introduced a correctness bug. You can hit it organically if you have a large avro file where the 16 byte sync marker straddles a block boundary. In that case the block after the sync marker may not be scanned, resulting in a few records missing.
It's possible to reproduce on our test data by tweaking max_scan_range_length until you find a value where count returns fewer results.
We do have test coverage in TestScanRangeLengths that exercise the code with avro blocks straddling scan ranges. However, the necessary condition for this bug is that the scan range includes a full avro block, followed by a sync marker on the boundary with the next scan range. We need to add test coverage for a larger range of values here - larger files and larger scan ranges.