Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11410

[Rust][Parquet] Implement returning dictionary arrays from parquet reader

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Invalid
    • None
    • None
    • Rust
    • None

    Description

      Currently the Rust parquet reader returns a regular array (e.g. string array) even when the column is dictionary encoded in the parquet file.

      If the parquet reader had the ability to return dictionary arrays for dictionary encoded columns this would bring many benefits such as:

      • faster reading of dictionary encoded columns from parquet (as no conversion/expansion into a regular array would be necessary)
      • more efficient memory use as the dictionary array would use less memory when loaded in memory
      • faster filtering operations as SIMD can be used to filter over the numeric keys of a dictionary string array instead of comparing string values in a string array

      nevime , alamb  let me know what you think

      Attachments

        Activity

          People

            Unassigned Unassigned
            yordan-pavlov Yordan Pavlov
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: