Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-548

Add Java metadata for PageEncodingStats

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.9.0, 1.8.2
    • Component/s: parquet-mr
    • Labels:
      None

      Description

      PARQUET-384 needs to determine whether an entire column chunk is dictionary-encoded, but it is difficult to detect that case based on the set of encodings for a column. For 1.0, this can be done by checking for a PLAIN page because both dictionary pages and dictionary-encoded pages use PLAIN_DICTIONARY and RLE/BIT_PACKING is only used for repetition and definition levels. But for 2.0, dictionary pages might be using PLAIN and there is no way to tell if a column has fallen back.

      PageEncodingStats were added to the format to solve this problem, so we just need to implement them.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                rdblue Ryan Blue
                Reporter:
                rdblue Ryan Blue
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: