Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-5705

Improved tries in TotalOrderPartitioner to eliminate large leaf nodes.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.19.1
    • 0.21.0
    • None
    • None
    • Reviewed

    Description

      With the old technology, if a particular node in the trie has many children that contain no split points, TotalOrderPartitioner creates a separate empty leaf node for each one. This takes a lot of space, which in turn limits the depth to which we can grow these trees to avoid making too many sparse nodes, so there is a parameter that defaults to 2 to control the depth. With this patch, I can guarantee that each split point will create only three trie nodes in the trie: an empty one, a leaf that contains only one split point, and the internal nodes as needed, for a total space of perhaps 50-200 bytes per split point, even if the entire trie is elaborated. There are pathological cases that can cause a recursion overflow during creation, so i have set the default tree max depth to 200, but I expect almost all tries to be fully elaborated, which means in turn that each byte of the sought key gets touched at most twice.

      Attachments

        1. hadoop-5705.patch
          8 kB
          Dick King
        2. hadoop-5705-no-names.patch
          8 kB
          Dick King

        Activity

          People

            dking Dick King
            dking Dick King
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: