Hadoop Common
  1. Hadoop Common
  2. HADOOP-5705

Improved tries in TotalOrderPartitioner to eliminate large leaf nodes.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.19.1
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      With the old technology, if a particular node in the trie has many children that contain no split points, TotalOrderPartitioner creates a separate empty leaf node for each one. This takes a lot of space, which in turn limits the depth to which we can grow these trees to avoid making too many sparse nodes, so there is a parameter that defaults to 2 to control the depth. With this patch, I can guarantee that each split point will create only three trie nodes in the trie: an empty one, a leaf that contains only one split point, and the internal nodes as needed, for a total space of perhaps 50-200 bytes per split point, even if the entire trie is elaborated. There are pathological cases that can cause a recursion overflow during creation, so i have set the default tree max depth to 200, but I expect almost all tries to be fully elaborated, which means in turn that each byte of the sought key gets touched at most twice.

      1. hadoop-5705.patch
        8 kB
        Dick King
      2. hadoop-5705-no-names.patch
        8 kB
        Dick King

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Dick King
            Reporter:
            Dick King
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development