Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3544

Ability to use SimpleRegeratingCode to fix missing blocks

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: contrib/raid
    • Labels:
      None

      Description

      ReedSolomon encoding (n, k) has n storage nodes and can tolerate n-k failures. Regenerating a block needs to access k blocks. This is a problem when n and k are large. Instead, we can use simple regenerating codes (n, k, f) that does first does ReedSolomon (n,k) and then does XOR with f stripe size. Then, a single disk failure needs to access only f nodes and f can be very small.

        Issue Links

          Activity

          dhruba borthakur created issue -
          Hide
          Scott Chen added a comment -

          Here is the paper of SimpleRegenerating Code.
          http://arxiv.org/pdf/1109.0264

          Show
          Scott Chen added a comment - Here is the paper of SimpleRegenerating Code. http://arxiv.org/pdf/1109.0264
          Hide
          Scott Chen added a comment -

          I think one thing we need to do is to refactor the raid.Encoder and raid.Decoder. So they are generic to any ErasureCode. That way we can easily add different codes to Raid.

          Show
          Scott Chen added a comment - I think one thing we need to do is to refactor the raid.Encoder and raid.Decoder. So they are generic to any ErasureCode. That way we can easily add different codes to Raid.
          Hide
          Ramkumar Vadali added a comment -

          It will also be nice to have this code be backwards compatible with existing Reed-Solomon parity files. If there is an existing Reed-Solomon parity file, the code can identify that by counting the number of parity blocks with the expected number of Reed-Solomon parity files. This is doable because the additional XOR parity blocks will increase the total number of parity blocks by a deterministic number. Thus this code will be able to handle existing Reed-Solomon parity files and will generate new files with additional XOR blocks.

          Show
          Ramkumar Vadali added a comment - It will also be nice to have this code be backwards compatible with existing Reed-Solomon parity files. If there is an existing Reed-Solomon parity file, the code can identify that by counting the number of parity blocks with the expected number of Reed-Solomon parity files. This is doable because the additional XOR parity blocks will increase the total number of parity blocks by a deterministic number. Thus this code will be able to handle existing Reed-Solomon parity files and will generate new files with additional XOR blocks.
          Hide
          Alex Dimakis added a comment -

          I can see that backwards compatibility would be crucial for a deployed system. It is not always clear how to find if a parity block is a simple parity or an RS parity just by counting since the config files might have different number of simple parities (our default kept the total number of parities to 4 by having two RS and two 6 degree XORs) to keep the same storage overhead as a (14,10) Reed Solomon.

          I think a cleaner way to understand what each parity is, can be done through the meta data file or the folder it is in (right now how do you distinguish simple XOR to RS parities)?

          Show
          Alex Dimakis added a comment - I can see that backwards compatibility would be crucial for a deployed system. It is not always clear how to find if a parity block is a simple parity or an RS parity just by counting since the config files might have different number of simple parities (our default kept the total number of parities to 4 by having two RS and two 6 degree XORs) to keep the same storage overhead as a (14,10) Reed Solomon. I think a cleaner way to understand what each parity is, can be done through the meta data file or the folder it is in (right now how do you distinguish simple XOR to RS parities)?
          Hide
          Alex Dimakis added a comment -

          After some discussion, it seems that the easiest method to ensure backwards compatibility is to recognize if this is a Reed-Solomon coded packet or an SRC coded packet by the size of the parity file. If we use stripe size 10, 4 RS packets and 2 simple XORs then the parity file should have size 64MB*6 while for RS coded files it should be 64MB*4. We will implement this distinction in Processfile and run different decoder functions accordingly.

          Show
          Alex Dimakis added a comment - After some discussion, it seems that the easiest method to ensure backwards compatibility is to recognize if this is a Reed-Solomon coded packet or an SRC coded packet by the size of the parity file. If we use stripe size 10, 4 RS packets and 2 simple XORs then the parity file should have size 64MB*6 while for RS coded files it should be 64MB*4. We will implement this distinction in Processfile and run different decoder functions accordingly.
          Hide
          Maheswaran Sathiamoorthy added a comment -

          There is another way of doing it:
          I will add a new erasure code type called SRC to ErasureCodeType (which has XOR, RS now) and start storing SRC coded files in /raidsrc (RS files stored in /raidrs, XOR in /raid). When a file corruption is detected
          and recoverBlockToFile is called, the first thing to do is to check whether the file is a parity file or a source file. By looking at the location it can be easily determined whether this is a parity file and if so which type. Now if its not a parity file, then it is a source file and we need to determine its corresponding parity file. This can be done by checking for a parity file first in /raidsrc, and then in /raidrs and /raid to find out where it is located. That way we can find the parity file too.
          The same thing can be done by determining the filesize, for which we still need to search for the parity file by going to /raidrs or /raid; so I think the above approach is a little bit cleaner.
          For reconstructing the file, in either approach, we need to pass the ErasureCodeType all the way till the decoder and encoder.

          Show
          Maheswaran Sathiamoorthy added a comment - There is another way of doing it: I will add a new erasure code type called SRC to ErasureCodeType (which has XOR, RS now) and start storing SRC coded files in /raidsrc (RS files stored in /raidrs, XOR in /raid). When a file corruption is detected and recoverBlockToFile is called, the first thing to do is to check whether the file is a parity file or a source file. By looking at the location it can be easily determined whether this is a parity file and if so which type. Now if its not a parity file, then it is a source file and we need to determine its corresponding parity file. This can be done by checking for a parity file first in /raidsrc, and then in /raidrs and /raid to find out where it is located. That way we can find the parity file too. The same thing can be done by determining the filesize, for which we still need to search for the parity file by going to /raidrs or /raid; so I think the above approach is a little bit cleaner. For reconstructing the file, in either approach, we need to pass the ErasureCodeType all the way till the decoder and encoder.
          Hide
          Scott Chen added a comment -

          Assign to Weiyan because he is working on this now.

          Show
          Scott Chen added a comment - Assign to Weiyan because he is working on this now.
          Scott Chen made changes -
          Field Original Value New Value
          Assignee dhruba borthakur [ dhruba ] Weiyan Wang [ weiyan ]
          Scott Chen made changes -
          Link This issue depends upon HDFS-3543 [ HDFS-3543 ]
          Scott Chen made changes -
          Project Hadoop Map/Reduce [ 12310941 ] Hadoop HDFS [ 12310942 ]
          Key MAPREDUCE-3361 HDFS-3544
          Component/s contrib/raid [ 12313080 ]
          Component/s contrib/raid [ 12313416 ]
          Hide
          Chris Li added a comment -

          Any updates on this issue? We're interested in trying this out to save space on our cold files.

          Show
          Chris Li added a comment - Any updates on this issue? We're interested in trying this out to save space on our cold files.
          Weiyan Wang made changes -
          Assignee Weiyan Wang [ weiyan ]
          Hide
          Weiyan Wang added a comment -

          Chris, Sorry for the late reply, we are actually not working on this because of the bandwidth. For cold files, we still use (14, 10) Reed Solomon to save spaces right now.

          Show
          Weiyan Wang added a comment - Chris, Sorry for the late reply, we are actually not working on this because of the bandwidth. For cold files, we still use (14, 10) Reed Solomon to save spaces right now.
          Hide
          Chris Li added a comment -

          Okay, we're very interested in SRC in our clusters and we will work on this feature if we can get some momentum. Either way I'll keep you posted.

          Show
          Chris Li added a comment - Okay, we're very interested in SRC in our clusters and we will work on this feature if we can get some momentum. Either way I'll keep you posted.
          Hide
          Alex Dimakis added a comment -

          Chris, our VLDB paper has an evolution of Simple regenerating codes that we called Locally Repairable Codes, (LRC) see our paper
          http://smahesh.com/HadoopUSC/

          You can contact me for more details if you are interested.

          Show
          Alex Dimakis added a comment - Chris, our VLDB paper has an evolution of Simple regenerating codes that we called Locally Repairable Codes, (LRC) see our paper http://smahesh.com/HadoopUSC/ You can contact me for more details if you are interested.
          Hide
          liuruoxi added a comment -

          Hey,i m a new learner about hadoop and hdfs-raid,i got a question ,why doesnt the raid codes be merged into the trunk since release 2.0.2?

          Show
          liuruoxi added a comment - Hey,i m a new learner about hadoop and hdfs-raid,i got a question ,why doesnt the raid codes be merged into the trunk since release 2.0.2?

            People

            • Assignee:
              Unassigned
              Reporter:
              dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

              • Created:
                Updated:

                Development