Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1705

Cannot write Parquet files when values are larger than 64KB

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.0
    • Impala 2.2
    • None
    • None

    Description

      The patch for IMPALA-1157 added the following check in In HdfsParquetTableWriter::AppendRow():

          if (UNLIKELY(bytes_needed > DATA_PAGE_SIZE)) {
            stringstream ss;
            ss << "Cannot write value that needs " << bytes_needed << " bytes to a Parquet "
               << "data page of size " << DATA_PAGE_SIZE << ".";
            return Status(ss.str());
          }
      

      DATA_PAGE_SIZE is hardcoded as 64k, which means that in some cases if there is a value in a row that it is larger than 64KB we won't be able to write it. Instead of throwing an error, we can allocate a larger data page, large enough to fit that value.

      This only happens when the encoding type is PLAIN, i.e. there were already too many unique values for PLAIN_DICTIONARY encoding.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mjacobs Matthew Jacobs
            ippokratis Ippokratis Pandis
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment