Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1892

RaidNode can allow layered policies more efficiently

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.22.0
    • contrib/raid
    • None
    • Reviewed


      The RaidNode policy file can have layered policies that can cover a file more than once. To avoid processing a file multiple times (for RAIDing), RaidNode maintains a list of processed files that is used to avoid duplicate processing attempts.

      This is problematic in that a large number of processed files could cause the RaidNode to run out of memory.

      This task proposes a better method of detecting processed files. The method is based on the observation that a more selective policy will have a better match with a file name than a less selective one. Specifically, the more selective policy will have a longer common prefix with the file name.

      So to detect if a file has already been processed, the RaidNode only needs to maintain a list of processed policies and compare the lengths of the common prefixes. If the file has a longer common prefix with one of the processed policies than with the current policy, it can be assumed to be processed already.



          This comment will be Viewable by All Users Viewable by All Users


            rvadali Ramkumar Vadali Assign to me
            rvadali Ramkumar Vadali
            0 Vote for this issue
            4 Start watching this issue



              Issue deployment