Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1705

Cannot write Parquet files when values are larger than 64KB

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.0
    • Fix Version/s: Impala 2.2
    • Component/s: None
    • Labels:
      None

      Description

      The patch for IMPALA-1157 added the following check in In HdfsParquetTableWriter::AppendRow():

          if (UNLIKELY(bytes_needed > DATA_PAGE_SIZE)) {
            stringstream ss;
            ss << "Cannot write value that needs " << bytes_needed << " bytes to a Parquet "
               << "data page of size " << DATA_PAGE_SIZE << ".";
            return Status(ss.str());
          }
      

      DATA_PAGE_SIZE is hardcoded as 64k, which means that in some cases if there is a value in a row that it is larger than 64KB we won't be able to write it. Instead of throwing an error, we can allocate a larger data page, large enough to fit that value.

      This only happens when the encoding type is PLAIN, i.e. there were already too many unique values for PLAIN_DICTIONARY encoding.

        Attachments

          Activity

            People

            • Assignee:
              mjacobs Matthew Jacobs
              Reporter:
              ippokratis Ippokratis Pandis
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: