Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6933

[Java] Suppor linear dictionary encoder

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.16.0
    • Component/s: Java

      Description

      For many scenarios, the distribution of dictionary entries is highly skewed. In other words, a few dictionary entries occurs much more frequently than others. If we can sort the dictionary by the non-increasing order of entry frequencies, and compare each value to encode from the beginning of the dictionary, we get the following benefits:

      1) We need no extra memory space or data structure.
      2) The search is extremely efficient, as we are likely to find a match in the first few entries of the dictionary.

      This is the basic idea behind the linear dictionary encoder. When the scenario is right (highly skewed dictionary distribution), it outperforms both search based encoder and hash table based encoders.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                fan_li_ya Liya Fan
                Reporter:
                fan_li_ya Liya Fan
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h