Hadoop Common
  1. Hadoop Common
  2. HADOOP-5705

Improved tries in TotalOrderPartitioner to eliminate large leaf nodes.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.19.1
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      With the old technology, if a particular node in the trie has many children that contain no split points, TotalOrderPartitioner creates a separate empty leaf node for each one. This takes a lot of space, which in turn limits the depth to which we can grow these trees to avoid making too many sparse nodes, so there is a parameter that defaults to 2 to control the depth. With this patch, I can guarantee that each split point will create only three trie nodes in the trie: an empty one, a leaf that contains only one split point, and the internal nodes as needed, for a total space of perhaps 50-200 bytes per split point, even if the entire trie is elaborated. There are pathological cases that can cause a recursion overflow during creation, so i have set the default tree max depth to 200, but I expect almost all tries to be fully elaborated, which means in turn that each byte of the sought key gets touched at most twice.

      1. hadoop-5705.patch
        8 kB
        Dick King
      2. hadoop-5705-no-names.patch
        8 kB
        Dick King

        Activity

        Dick King created issue -
        Dick King made changes -
        Field Original Value New Value
        Attachment hadoop-5705.patch [ 12405833 ]
        Chris Douglas made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Assignee Dick King [ dking ]
        Dick King made changes -
        Attachment hadoop-5705-no-names.patch [ 12406276 ]
        Chris Douglas made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Fix Version/s 0.21.0 [ 12313563 ]
        Resolution Fixed [ 1 ]
        Owen O'Malley made changes -
        Component/s mapred [ 12310690 ]
        Tom White made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Dick King
            Reporter:
            Dick King
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development