Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-255

Finalize Dictionary representation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.1.0
    • Format
    • None

    Description

      format/Messages.fbs mentions DictionaryBatches with an id but does not specify where they are referenced.

      We should add a dictionary: long in Field that references the dictionary id:

      Field: https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L86

      Dictionary id: https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L165

      We need a spec in format/Layout.md that describes the dictionary layout.
      When dictionary encoded the value vector is an array of signed int32 (for consistency with variable length collection offsets).
      The dictionary vector is a Vector of the type of the value. indexed by their id in the dictionary.

      Attachments

        Issue Links

          Activity

            People

              julienledem Julien Le Dem
              julienledem Julien Le Dem
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: