Description
We have created .orc file using Apache ORC library, I can not provide a reproducible way to create such a file.
We have statistics for 100% row groups, checked with orc dump.
But I see that when we search by that file we get a very strange behavior:
TRACE org.apache.orc.impl.RecordReaderImpl: Stats = numberOfValues: 0
stringStatistics {
}
hasNull: false
TRACE org.apache.orc.impl.RecordReaderImpl: Setting (EQUALS value 71231231212) to YES_NO_NULL
DEBUG org.apache.orc.impl.RecordReaderImpl: Row group 340000 to 349999 is included.
If there are 0 values according to existing statistics, so there is obviously no need to read that row group.
And yet we have YES_NO_NULL decision which forces inclusion of that row group in subsequent operation, which meaningless and bad for performance.
Attachments
Attachments
Issue Links
- is related to
-
ORC-1075 Support reading ORC files with no column statistics
- Closed
- links to