Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: contrib/raid
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Raid introduce the new dependency between blocks within a file.
      The blocks help decode each other. Therefore we should avoid put them on the same machine.

      The proposed BlockPlacementPolicy does the following
      1. When writing parity blocks, it avoid the parity blocks and source blocks sit together.
      2. When reducing replication number, it deletes the blocks that sits with other dependent blocks.
      3. It does not change the way we write normal files. It only has different behavior when processing raid files.

      1. test.result
        52 kB
        Scott Chen
      2. MAPREDUCE-1831-v2.txt
        45 kB
        Scott Chen
      3. MAPREDUCE-1831.v1.1.txt
        8 kB
        Scott Chen
      4. MAPREDUCE-1831.txt
        7 kB
        Scott Chen
      5. MAPREDUCE-1831.20100610.txt
        10 kB
        Scott Chen

        Issue Links

          Activity

          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          dhruba borthakur made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Scott Chen made changes -
          Attachment test.result [ 12466008 ]
          Scott Chen made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Affects Version/s 0.23.0 [ 12315570 ]
          Affects Version/s 0.22.0 [ 12314184 ]
          Fix Version/s 0.23.0 [ 12315570 ]
          Fix Version/s 0.22.0 [ 12314184 ]
          Scott Chen made changes -
          Link This issue depends on MAPREDUCE-1861 [ MAPREDUCE-1861 ]
          Scott Chen made changes -
          Link This issue is part of MAPREDUCE-1969 [ MAPREDUCE-1969 ]
          Scott Chen made changes -
          Attachment MAPREDUCE-1831-v2.txt [ 12465867 ]
          Scott Chen made changes -
          Summary Delete the co-located replicas when raiding file BlockPlacement policy for RAID
          Description In raid, it is good to have the blocks on the same stripe located on different machine.
          This way when one machine is down, it does not broke two blocks on the stripe.
          By doing this, we can decrease the block error probability in raid from O(p^3) to O(p^4) which can be a hugh improvement (where p is the replica missing probability).

          One way to do this is that we can add a new BlockPlacementPolicy which deletes the replicas that are co-located.
          So when raiding the file, we can make the remaining replicas live on different machines.
          Raid introduce the new dependency between blocks within a file.
          The blocks help decode each other. Therefore we should avoid put them on the same machine.

          The proposed BlockPlacementPolicy does the following
          1. When writing parity blocks, it avoid the parity blocks and source blocks sit together.
          2. When reducing replication number, it deletes the blocks that sits with other dependent blocks.
          3. It does not change the way we write normal files. It only has different behavior when processing raid files.
          Scott Chen made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          dhruba borthakur made changes -
          Link This issue depends on MAPREDUCE-1861 [ MAPREDUCE-1861 ]
          Scott Chen made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Scott Chen made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Scott Chen made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Scott Chen made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Scott Chen made changes -
          Link This issue is related to MAPREDUCE-1861 [ MAPREDUCE-1861 ]
          Scott Chen made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Scott Chen made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Scott Chen made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Scott Chen made changes -
          Attachment MAPREDUCE-1831.20100610.txt [ 12446833 ]
          Scott Chen made changes -
          Description In raid, it is good to have the blocks on the same stripe located on different machine.
          This way when one machine is down, it does not broke two blocks on the stripe.
          By doing this, we can decrease the block error probability in raid from O(p^3) to O(p^4) which can be a hugh improvement.

          One way to do this is that we can add a new BlockPlacementPolicy which deletes the replicas that are co-located.
          So when raiding the file, we can make the remaining replicas live on different machines.
          In raid, it is good to have the blocks on the same stripe located on different machine.
          This way when one machine is down, it does not broke two blocks on the stripe.
          By doing this, we can decrease the block error probability in raid from O(p^3) to O(p^4) which can be a hugh improvement (where p is the replica missing probability).

          One way to do this is that we can add a new BlockPlacementPolicy which deletes the replicas that are co-located.
          So when raiding the file, we can make the remaining replicas live on different machines.
          Scott Chen made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Scott Chen made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Scott Chen made changes -
          Attachment MAPREDUCE-1831.v1.1.txt [ 12446654 ]
          Scott Chen made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Scott Chen made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Scott Chen made changes -
          Attachment MAPREDUCE-1831.txt [ 12446653 ]
          Scott Chen made changes -
          Field Original Value New Value
          Summary Delete the replica on the most concentrated node when raiding file Delete the co-located replicas when raiding file
          Description In raid, it is good to have the blocks on the same stripe located on different machine.
          This way when one machine is down, it does not broke two blocks on the stripe.
          By doing this, we can decrease the block error probability in raid from O(p^3) to O(p^4) which can be a hugh improvement.

          One way to do this is that we can add a new BlockPlacementPolicy which delete the replicas that are co-located.
          So when raiding the file, we can make the remaining replicas live on different machines.
          In raid, it is good to have the blocks on the same stripe located on different machine.
          This way when one machine is down, it does not broke two blocks on the stripe.
          By doing this, we can decrease the block error probability in raid from O(p^3) to O(p^4) which can be a hugh improvement.

          One way to do this is that we can add a new BlockPlacementPolicy which deletes the replicas that are co-located.
          So when raiding the file, we can make the remaining replicas live on different machines.
          Scott Chen created issue -

            People

            • Assignee:
              Scott Chen
              Reporter:
              Scott Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development