Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17465

[Parquet] DELTA_BINARY_PACKED constraint on num_bits is too restrict?

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++, Parquet
    • None

    Description

      Consider the sequence of (int32) values

      [863490391,-816295192,1613070492,-1166045478,1856530847]

      This sequence can be encoded as a single block, single miniblock with a bit_width of 33.

      However, we currently require [1] the bit_width of each miniblock to be smaller than the bitwidth of the type it encodes.

      We could consider lifting this constraint, as, as shown in the example above, the values representation's `bit_width` can be smaller than the delta's representation's `bit_width`.

      [1] https://github.com/apache/arrow/blob/a376968089d7310f4a88d054822fa1eaf96c46f5/cpp/src/parquet/encoding.cc#L2173

      Attachments

        1. test.parquet
          1.0 kB
          Jorge Leitão

        Activity

          People

            Unassigned Unassigned
            jorgecarleitao Jorge Leitão
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: