Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1892

RaidNode can allow layered policies more efficiently

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: contrib/raid
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The RaidNode policy file can have layered policies that can cover a file more than once. To avoid processing a file multiple times (for RAIDing), RaidNode maintains a list of processed files that is used to avoid duplicate processing attempts.

      This is problematic in that a large number of processed files could cause the RaidNode to run out of memory.

      This task proposes a better method of detecting processed files. The method is based on the observation that a more selective policy will have a better match with a file name than a less selective one. Specifically, the more selective policy will have a longer common prefix with the file name.

      So to detect if a file has already been processed, the RaidNode only needs to maintain a list of processed policies and compare the lengths of the common prefixes. If the file has a longer common prefix with one of the processed policies than with the current policy, it can be assumed to be processed already.

        Attachments

        1. MAPREDUCE-1892.patch
          32 kB
          Ramkumar Vadali
        2. MAPREDUCE-1892.patch
          32 kB
          Ramkumar Vadali

          Activity

            People

            • Assignee:
              rvadali Ramkumar Vadali
              Reporter:
              rvadali Ramkumar Vadali
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: