Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1702

[C++] Make BufferedRowGroupWriter compatible with parquet encryption

    XMLWordPrintableJSON

Details

    Description

      The newly added parquet encryption feature currently works only with SerializedRowGroupWriter.
      There are several issues preventing the use of BufferedRowGroupWriter with encryption enabled:

      1. Meta encryptor not passed on to ColumnChunkMetaDataBuilder::Finish. This can trigger a null-pointer dereference (reported as segmentation fault).
      2. UpdateEncryption not called on Close, resulting in an incorrect AAD string when encrypting the column chunk metadata.
      3. The column ordinal passed on to PageWriter::Open is always zero, resulting in a wrong AAD string when encrypting the columns data (except for the first column).
      4. When decrypting a column chunk with no dictionary pages, PARQUET-1706 confuses the decryptor to think it is decrypting a dictionary page, which again causes a wrong AAD string to be used when decrypting.

      We propose a patch (few dozen lines) to fix the above issues.
      We also extend the current parquet-encryption-test unit test, which tests SerializedRowGroupWriter, to test also with BufferedRowGroupWriter.

      Attachments

        Issue Links

          Activity

            People

              oro Or Ozeri
              oro Or Ozeri
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m