Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6184

[Java] Provide hash table based dictionary encoder

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.16.0
    • Component/s: Java

      Description

      This is the second part of ARROW-5917. We provide a sort based encoder, as well as a hash table based encoder, to solve the problem with the current dictionary encoder. 

      In particular, we solve the following problems with the current encoder:

      1. There are repeated conversions between Java objects and bytes (e.g. vector.getObject).
      2. Unnecessary memory copy (the vector data must be copied to the hash table).
      3. The hash table cannot be reused for encoding multiple vectors (other data structure & results cannot be reused either).
      4. The output vector should not be created/managed by the encoder (just like in the out-of-place sorter)
      5. The hash table requires that the hashCode & equals methods be implemented appropriately, but this is not guaranteed.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                fan_li_ya Liya Fan
                Reporter:
                fan_li_ya Liya Fan
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h 10m
                  5h 10m