Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
To predict the amount of memory required to read an ORC file we need to know the size of the dictionaries for the columns that we are reading. I propose adding the number of bytes for each column's dictionary to the stripe's column statistics. The file's column statistics would have the maximum dictionary size for each column.