Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8464

[Rust] [DataFusion] Add better and faster support for dictionary types

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Invalid
    • None
    • None
    • Rust - DataFusion
    • None

    Description

      Usecases: Efficiently process large columns of low cardinality Strings
       

      • BatchIterator should accept both DictionaryBatch and RecordBatch
      • Type Coercion optimizer rule should inject expression for converting dictionary value types to index types (for equality expressions, and IN(values, ...)
      • Physical expression would lookup index for dictionary values referenced in the query so that at runtime, only indices are being compared per batch

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              andygrove Andy Grove
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: