Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1382

MRAsyncDiscService should tolerate missing local.dir

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently when some of the local.dir do not exist, MRAsyncDiscService will fail. It should only fail when all directories don't work.

      1. MAPREDUCE-1382.1.patch
        6 kB
        Zheng Shao
      2. MAPREDUCE-1382.2.patch
        6 kB
        Zheng Shao
      3. MAPREDUCE-1382.3.patch
        6 kB
        Zheng Shao
      4. MAPREDUCE-1382.4.patch
        7 kB
        Dan Adkins
      5. MAPREDUCE-1382.5.patch
        7 kB
        Tom White
      6. MAPREDUCE-1382.branch-0.20.on.top.of.MAPREDUCE-1302.1.patch
        5 kB
        Zheng Shao
      7. MAPREDUCE-1382.patch
        7 kB
        Tom White

        Issue Links

          Activity

          Scott Chen created issue -
          Hide
          Zheng Shao added a comment -

          Added a test that tests deletion non-existing files.

          Show
          Zheng Shao added a comment - Added a test that tests deletion non-existing files.
          Zheng Shao made changes -
          Field Original Value New Value
          Attachment MAPREDUCE-1382.1.patch [ 12430496 ]
          Hide
          Zheng Shao added a comment -

          patch for 0.20

          Show
          Zheng Shao added a comment - patch for 0.20
          Zheng Shao made changes -
          Hide
          Todd Lipcon added a comment -

          Hi Zheng,

          I wonder whether localFileSystem.listStatus will ever throw IOE in the case of a bad volume? Yours handles the case where the volume doesn't exist at all, but I can imagine if it's on a broken mount, the listStatus will get an I/O Error from the underlying filesystem and throw at that point. Do you need to catch this and handle in the same way as if it were null? Thanks.

          Show
          Todd Lipcon added a comment - Hi Zheng, I wonder whether localFileSystem.listStatus will ever throw IOE in the case of a bad volume? Yours handles the case where the volume doesn't exist at all, but I can imagine if it's on a broken mount, the listStatus will get an I/O Error from the underlying filesystem and throw at that point. Do you need to catch this and handle in the same way as if it were null? Thanks.
          Zheng Shao made changes -
          Link This issue is related to MAPREDUCE-1213 [ MAPREDUCE-1213 ]
          Zheng Shao made changes -
          Link This issue is duplicated by MAPREDUCE-2049 [ MAPREDUCE-2049 ]
          Hide
          Zheng Shao added a comment -

          Todd, you are right. LocalFileSystem will throw IOE in that case.
          This patch addresses Todd's concern.

          Show
          Zheng Shao added a comment - Todd, you are right. LocalFileSystem will throw IOE in that case. This patch addresses Todd's concern.
          Zheng Shao made changes -
          Attachment MAPREDUCE-1382.2.patch [ 12454138 ]
          Hide
          Zheng Shao added a comment -

          Fixed unit test.

          Show
          Zheng Shao added a comment - Fixed unit test.
          Zheng Shao made changes -
          Attachment MAPREDUCE-1382.3.patch [ 12454147 ]
          Zheng Shao made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Scott Chen added a comment -

          Does it make sense to throw IOE if all volumes in the constructor are missing?

          Show
          Scott Chen added a comment - Does it make sense to throw IOE if all volumes in the constructor are missing?
          Hide
          Zheng Shao added a comment -

          I believe other logic in TaskTracker/JobTracker will fail and report in that case.

          Show
          Zheng Shao added a comment - I believe other logic in TaskTracker/JobTracker will fail and report in that case.
          Hide
          Scott Chen added a comment -

          +1, the patch looks good to me

          Show
          Scott Chen added a comment - +1, the patch looks good to me
          Hide
          Dan Adkins added a comment -

          I have applied the latest patch to trunk and still I cannot start a task tracker if the the directory does not exist:

          10/09/09 00:36:29 WARN mapred.TaskTracker: Task Tracker local can not create directory: /mnt/data8/mr20
          10/09/09 00:36:29 WARN util.MRAsyncDiskService: Cannot create toBeDeleted in /mnt/data8/mr20. Ignored.
          10/09/09 00:36:29 ERROR mapred.TaskTracker: Can not start task tracker because java.io.FileNotFoundException: File /mnt/data8/mr20/toBeDeleted does not exist.
          at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:301)
          at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1071)
          at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1096)
          at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:494)
          at org.apache.hadoop.mapreduce.util.MRAsyncDiskService.<init>(MRAsyncDiskService.java:103)
          at org.apache.hadoop.mapreduce.util.MRAsyncDiskService.<init>(MRAsyncDiskService.java:127)
          at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:616)
          at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1323)
          at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3508)

          Show
          Dan Adkins added a comment - I have applied the latest patch to trunk and still I cannot start a task tracker if the the directory does not exist: 10/09/09 00:36:29 WARN mapred.TaskTracker: Task Tracker local can not create directory: /mnt/data8/mr20 10/09/09 00:36:29 WARN util.MRAsyncDiskService: Cannot create toBeDeleted in /mnt/data8/mr20. Ignored. 10/09/09 00:36:29 ERROR mapred.TaskTracker: Can not start task tracker because java.io.FileNotFoundException: File /mnt/data8/mr20/toBeDeleted does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:301) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1071) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1096) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:494) at org.apache.hadoop.mapreduce.util.MRAsyncDiskService.<init>(MRAsyncDiskService.java:103) at org.apache.hadoop.mapreduce.util.MRAsyncDiskService.<init>(MRAsyncDiskService.java:127) at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:616) at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1323) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3508)
          Hide
          Dan Adkins added a comment -

          s/finally/catch (Exception e)/

          The previous patch was still throwing the exception instead of ignoring it as intended.

          Show
          Dan Adkins added a comment - s/finally/catch (Exception e)/ The previous patch was still throwing the exception instead of ignoring it as intended.
          Dan Adkins made changes -
          Attachment MAPREDUCE-1382.4.patch [ 12454174 ]
          Hide
          Scott Chen added a comment -

          The try { } block in cleanupAllVolumes() also needs the fix Dan pointed out.

          Show
          Scott Chen added a comment - The try { } block in cleanupAllVolumes() also needs the fix Dan pointed out.
          Hide
          Scott Chen added a comment -

          Zheng, Is it possible that you can add a simple test for this bug? Use some non-exist volume name to construct a MRAsyncDiskService.

          Show
          Scott Chen added a comment - Zheng, Is it possible that you can add a simple test for this bug? Use some non-exist volume name to construct a MRAsyncDiskService.
          Hide
          Tom White added a comment -

          Added a unit test. I also tested manually by starting a jobtracker which included unwritable directory specified in mapred.local.dir.

          Show
          Tom White added a comment - Added a unit test. I also tested manually by starting a jobtracker which included unwritable directory specified in mapred.local.dir.
          Tom White made changes -
          Attachment MAPREDUCE-1382.5.patch [ 12467776 ]
          Tom White made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Tom White made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Tom White made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Tom White made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Todd Lipcon added a comment -

          +1, looks good to me.

          Show
          Todd Lipcon added a comment - +1, looks good to me.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12467776/MAPREDUCE-1382.5.patch
          against trunk revision 1058374.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.mapred.TestControlledMapReduceJob
          org.apache.hadoop.mapred.TestSetupTaskScheduling

          -1 contrib tests. The patch failed contrib unit tests.

          -1 system test framework. The patch failed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/14//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/14//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/14//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12467776/MAPREDUCE-1382.5.patch against trunk revision 1058374. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.mapred.TestControlledMapReduceJob org.apache.hadoop.mapred.TestSetupTaskScheduling -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/14//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/14//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/14//console This message is automatically generated.
          Hide
          Tom White added a comment -

          Minor change to the test (which I tested locally) to make the test directory writable again after the test has run so that the directory can be removed easily later (e.g. by ant clean). The failures are known or not caused by this patch, so I'd like to commit this.

          Show
          Tom White added a comment - Minor change to the test (which I tested locally) to make the test directory writable again after the test has run so that the directory can be removed easily later (e.g. by ant clean). The failures are known or not caused by this patch, so I'd like to commit this.
          Tom White made changes -
          Attachment MAPREDUCE-1382.patch [ 12469016 ]
          Hide
          Todd Lipcon added a comment -

          +1, seems like a good fix.

          Show
          Todd Lipcon added a comment - +1, seems like a good fix.
          Hide
          Tom White added a comment -

          I've just committed this. Thanks, Zheng.

          Show
          Tom White added a comment - I've just committed this. Thanks, Zheng.
          Tom White made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Fix Version/s 0.22.0 [ 12314184 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #581 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/581/)
          MAPREDUCE-1382. MRAsyncDiscService should tolerate missing local.dir. Contributed by Zheng Shao and tomwhite.

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #581 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/581/ ) MAPREDUCE-1382 . MRAsyncDiscService should tolerate missing local.dir. Contributed by Zheng Shao and tomwhite.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #643 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/643/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #643 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/643/ )
          Konstantin Shvachko made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          125d 2h 23m 2 Tom White 11/Jan/11 22:19
          Open Open Patch Available Patch Available
          235d 23h 2m 3 Tom White 11/Jan/11 22:19
          Patch Available Patch Available Resolved Resolved
          13d 25m 1 Tom White 24/Jan/11 22:44
          Resolved Resolved Closed Closed
          321d 7h 34m 1 Konstantin Shvachko 12/Dec/11 06:18

            People

            • Assignee:
              Zheng Shao
              Reporter:
              Scott Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development