Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-411

Format: Add a flag when min/max are truncated

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • format-2.3.1
    • None
    • parquet-format
    • None

    Description

      PARQUET-372 drops page and column chunk stats when values are larger than 4k to avoid storing very large values in page headers and the file footer. An alternative approach is to truncate the values, which would still allow filtering on page stats. The problem with truncating values is that the value in stats may not be the true min or max so engines that use these values as the result of aggregations like min(col) would return incorrect data. We should consider adding metadata to allow truncating values for filtering that captures the fact that the values have been modified.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rdblue Ryan Blue
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: