Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1819

RaidNode should be smarter in submitting Raid jobs

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.20.1
    • 0.22.0
    • contrib/raid
    • None
    • Reviewed

    Description

      The RaidNode currently computes parity files as follows:
      1. Using RaidNode.selectFiles() to figure out what files to raid for a policy
      2. Using #1 repeatedly for each configured policy to accumulate a list of files.
      3. Submitting a mapreduce job with the list of files from #2 using DistRaid.doDistRaid()

      This task addresses the fact that #2 and #3 happen sequentially. The proposal is to submit a separate mapreduce job for the list of files for each policy and use another thread to track the progress of the submitted jobs. This will help reduce the time taken for files to be raided.

      Attachments

        1. MAPREDUCE-1819.4.patch
          155 kB
          Ramkumar Vadali
        2. MAPREDUCE-1819.5.patch
          155 kB
          Ramkumar Vadali
        3. MAPREDUCE-1819.patch
          154 kB
          Ramkumar Vadali
        4. MAPREDUCE-1819.patch.2
          153 kB
          Ramkumar Vadali
        5. MAPREDUCE-1819.patch.3
          155 kB
          Ramkumar Vadali

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rvadali Ramkumar Vadali
            rvadali Ramkumar Vadali
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment