Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12101

[Format] Consider adding int0 and other small integer types for more efficient Dictionary encoding

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • Format
    • None

    Description

      I often come across the need to store single-valued columns. The current Arrow format doesn't have an efficient way to represent these, I believe. One possible improvement would be to introduce an int0 type (where all values are 0) that, like null, does not have a buffer allocated. Then this could be used as an index into a Dictionary with a single value.

      For low cardinality columns, I also often find myself wishing for int1, int2, and int4 types to use as an index.

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            justin.talbot Justin Talbot
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: