Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1519

RaidNode fails to create new parity file if an older version already exists

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: contrib/raid
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When RaidNode tries to recreate a parity file for a source file that has been modified (recreated) recently, it crashes.

      1. MAPREDUCE-1519.patch
        3 kB
        Rodrigo Schmidt

        Issue Links

          Activity

          Rodrigo Schmidt created issue -
          Rodrigo Schmidt made changes -
          Field Original Value New Value
          Link This issue is depended upon by MAPREDUCE-1510 [ MAPREDUCE-1510 ]
          Hide
          Rodrigo Schmidt added a comment -

          Patch attached

          Show
          Rodrigo Schmidt added a comment - Patch attached
          Rodrigo Schmidt made changes -
          Attachment MAPREDUCE-1519.patch [ 12436548 ]
          Rodrigo Schmidt made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          dhruba borthakur added a comment -

          can you pl explain the bug and the proposed fix? Thanks.

          Show
          dhruba borthakur added a comment - can you pl explain the bug and the proposed fix? Thanks.
          dhruba borthakur made changes -
          Component/s contrib/raid [ 12313416 ]
          Hide
          Rodrigo Schmidt added a comment -

          Sure, Dhruba!

          Let's assume there's a file /usr/schmidt/file1 that has been recently raided. In that case, the RaidNode will have crated a parity file /raid/usr/schmidt/file1

          Now, imagine the user rewrites its original file and creates a new version of /usr/schmidt/file1.

          The RaidNode correctly identifies the file has to be re-raided, creates the parity file in a temporary directory, but when it tries to move the newly created parity file from the temporary directory to /raid/usr/schmidt/file1, it fails because the destination already exists.

          The patch verifies if the destination exists before renaming it and deletes the file in that case.

          The unit test covers this scenario.

          I actually found this bug when you proposed to add the unit test that covers this scenario on my patch to MAPREDUCE-1510. I didn't want to fix this bug there because MAPREDUCE-1510 was an improvement and not a bug, and (ii) neither the title or the description of MAPREDUCE-1510 covers this failure scenario and people looking for this problem would have a hard time finding the correct JIRA.

          Show
          Rodrigo Schmidt added a comment - Sure, Dhruba! Let's assume there's a file /usr/schmidt/file1 that has been recently raided. In that case, the RaidNode will have crated a parity file /raid/usr/schmidt/file1 Now, imagine the user rewrites its original file and creates a new version of /usr/schmidt/file1. The RaidNode correctly identifies the file has to be re-raided, creates the parity file in a temporary directory, but when it tries to move the newly created parity file from the temporary directory to /raid/usr/schmidt/file1, it fails because the destination already exists. The patch verifies if the destination exists before renaming it and deletes the file in that case. The unit test covers this scenario. I actually found this bug when you proposed to add the unit test that covers this scenario on my patch to MAPREDUCE-1510 . I didn't want to fix this bug there because MAPREDUCE-1510 was an improvement and not a bug, and (ii) neither the title or the description of MAPREDUCE-1510 covers this failure scenario and people looking for this problem would have a hard time finding the correct JIRA.
          Hide
          dhruba borthakur added a comment -

          This is confusing me. A HDFS.rename() should succeed even if the target file exists. The rename call deletes the original file and moves the new one to its place. Is this not what you are seeing?

          Show
          dhruba borthakur added a comment - This is confusing me. A HDFS.rename() should succeed even if the target file exists. The rename call deletes the original file and moves the new one to its place. Is this not what you are seeing?
          Hide
          Rodrigo Schmidt added a comment -

          Definitely not! Reason why I had to change the code to pass the unit test.

          However, my original work was done in hadoop 0.20, not trunk. Let me check if it works on trunk without the changes to RaidNode.

          Show
          Rodrigo Schmidt added a comment - Definitely not! Reason why I had to change the code to pass the unit test. However, my original work was done in hadoop 0.20, not trunk. Let me check if it works on trunk without the changes to RaidNode.
          Hide
          Rodrigo Schmidt added a comment -

          Just checked and the new unit test doesn't work on trunk if we don't change the code for the RaidNode.

          The unit test blocks and the logs start to report the following error when the RaidNode tries to move the parity file to the new location:

          [junit] 10/02/21 21:56:15 INFO raid.RaidNode: Exception while invoking action on policy RaidTest1 srcPath hdfs://localhost:59030/user/dhruba/policytest exception java.io.IOException: Unable to rename tmp file hdfs://localhost:59030/destraid/user/dhruba/policytest/file2.tmp to hdfs://localhost:59030/destraid/user/dhruba/policytest/file2
          [junit] at org.apache.hadoop.raid.RaidNode.generateParityFile(RaidNode.java:817)
          [junit] at org.apache.hadoop.raid.RaidNode.doRaid(RaidNode.java:705)
          [junit] at org.apache.hadoop.raid.RaidNode.doRaid(RaidNode.java:642)
          [junit] at org.apache.hadoop.raid.RaidNode$TriggerMonitor.doProcess(RaidNode.java:401)
          [junit] at org.apache.hadoop.raid.RaidNode$TriggerMonitor.run(RaidNode.java:307)
          [junit] at java.lang.Thread.run(Thread.java:637)

          Everything works fine with the proposed change.

          Show
          Rodrigo Schmidt added a comment - Just checked and the new unit test doesn't work on trunk if we don't change the code for the RaidNode. The unit test blocks and the logs start to report the following error when the RaidNode tries to move the parity file to the new location: [junit] 10/02/21 21:56:15 INFO raid.RaidNode: Exception while invoking action on policy RaidTest1 srcPath hdfs://localhost:59030/user/dhruba/policytest exception java.io.IOException: Unable to rename tmp file hdfs://localhost:59030/destraid/user/dhruba/policytest/file2.tmp to hdfs://localhost:59030/destraid/user/dhruba/policytest/file2 [junit] at org.apache.hadoop.raid.RaidNode.generateParityFile(RaidNode.java:817) [junit] at org.apache.hadoop.raid.RaidNode.doRaid(RaidNode.java:705) [junit] at org.apache.hadoop.raid.RaidNode.doRaid(RaidNode.java:642) [junit] at org.apache.hadoop.raid.RaidNode$TriggerMonitor.doProcess(RaidNode.java:401) [junit] at org.apache.hadoop.raid.RaidNode$TriggerMonitor.run(RaidNode.java:307) [junit] at java.lang.Thread.run(Thread.java:637) Everything works fine with the proposed change.
          Hide
          dhruba borthakur added a comment -

          I agree, I was mistakenly looking at java.io.File.renameTo.

          +1 , code looks good.

          Show
          dhruba borthakur added a comment - I agree, I was mistakenly looking at java.io.File.renameTo. +1 , code looks good.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12436548/MAPREDUCE-1519.patch
          against trunk revision 912471.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/472/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/472/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/472/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/472/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12436548/MAPREDUCE-1519.patch against trunk revision 912471. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/472/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/472/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/472/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/472/console This message is automatically generated.
          Hide
          dhruba borthakur added a comment -

          All the failed tests are in map-reduce and this patch touches only src/contrib/raid. No dependency here. i will go ahead and commit this patch,

          Show
          dhruba borthakur added a comment - All the failed tests are in map-reduce and this patch touches only src/contrib/raid. No dependency here. i will go ahead and commit this patch,
          Hide
          Rodrigo Schmidt added a comment -

          Thanks, The same errors are hitting all patches that have been submitted lately. Unfortunately it seems that trunk or Hudson is broken.

          Show
          Rodrigo Schmidt added a comment - Thanks, The same errors are hitting all patches that have been submitted lately. Unfortunately it seems that trunk or Hudson is broken.
          Hide
          dhruba borthakur added a comment -

          I just committed this. Thanks Rodrigo!

          Show
          dhruba borthakur added a comment - I just committed this. Thanks Rodrigo!
          dhruba borthakur made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.22.0 [ 12314184 ]
          Resolution Fixed [ 1 ]
          dhruba borthakur made changes -
          Hadoop Flags [Reviewed]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #252 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/252/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #252 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/252/ )
          Tom White made changes -
          Fix Version/s 0.21.0 [ 12314045 ]
          Fix Version/s 0.22.0 [ 12314184 ]
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Rodrigo Schmidt
              Reporter:
              Rodrigo Schmidt
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development