Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2474

[Format] Specify FIXED_SIZE_LIST Logical type

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • parquet-format
    • None

    Description

      Replicated from mailing list

      Arrow recently introduced FixedShapeTensor and VariableShapeTensor canonical extension types that use FixedSizeList and StructArray(List, FixedSizeList) as storage respectfully. These are targeted at machine learning and scientific applications that deal with large datasets and would benefit from using Parquet as on disk storage.

      However currently FixedSizeList is stored as List in Parquet which adds significant conversion overhead when reading and writing as discussed here. It would therefore be beneficial to introduce a FIXED_SIZE_LIST logical type to Parquet.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rok Rok Mihevc
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: