Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.0
-
None
-
None
Description
The patch for IMPALA-1157 added the following check in In HdfsParquetTableWriter::AppendRow():
if (UNLIKELY(bytes_needed > DATA_PAGE_SIZE)) { stringstream ss; ss << "Cannot write value that needs " << bytes_needed << " bytes to a Parquet " << "data page of size " << DATA_PAGE_SIZE << "."; return Status(ss.str()); }
DATA_PAGE_SIZE is hardcoded as 64k, which means that in some cases if there is a value in a row that it is larger than 64KB we won't be able to write it. Instead of throwing an error, we can allocate a larger data page, large enough to fit that value.
This only happens when the encoding type is PLAIN, i.e. there were already too many unique values for PLAIN_DICTIONARY encoding.