Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6184

[Java] Provide hash table based dictionary encoder

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.16.0
    • Java

    Description

      This is the second part of ARROW-5917. We provide a sort based encoder, as well as a hash table based encoder, to solve the problem with the current dictionary encoder. 

      In particular, we solve the following problems with the current encoder:

      1. There are repeated conversions between Java objects and bytes (e.g. vector.getObject).
      2. Unnecessary memory copy (the vector data must be copied to the hash table).
      3. The hash table cannot be reused for encoding multiple vectors (other data structure & results cannot be reused either).
      4. The output vector should not be created/managed by the encoder (just like in the out-of-place sorter)
      5. The hash table requires that the hashCode & equals methods be implemented appropriately, but this is not guaranteed.

      Attachments

        Issue Links

          Activity

            People

              fan_li_ya Liya Fan
              fan_li_ya Liya Fan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h 10m
                  5h 10m