Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.15.2
    • Fix Version/s: 0.15.3
    • Component/s: None
    • Labels:
      None

      Description

      HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed like this

      File jobCacheDir = new File(currentDir.getParentFile().getParent(), "work");
      

      We should change this to get it working. Referring to the changes made in HADOOP-2227, I see that the APIs used in there to construct the path are not public. And hard coding the path in streaming does not look good. thought?

      1. patch-2570.txt
        1 kB
        Amareshwari Sriramadasu
      2. HADOOP-2570_1_20080112.patch
        3 kB
        Arun C Murthy

        Issue Links

          Activity

          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-Nightly #365 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/365/ )
          Hide
          Devaraj Das added a comment -

          I just committed this. Thanks Amareshwari and Arun!

          Show
          Devaraj Das added a comment - I just committed this. Thanks Amareshwari and Arun!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12373016/HADOOP-2570_1_20080112.patch
          against trunk revision r611734.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new compiler warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests -1. The patch failed contrib unit tests.

          Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1578/testReport/
          Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1578/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1578/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1578/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12373016/HADOOP-2570_1_20080112.patch against trunk revision r611734. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1578/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1578/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1578/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1578/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          +1 for Arun's patch

          Show
          Amareshwari Sriramadasu added a comment - +1 for Arun's patch
          Hide
          Devaraj Das added a comment -

          Hudson seems to be lost somewhere. Should we just go ahead and commit this patch?

          Show
          Devaraj Das added a comment - Hudson seems to be lost somewhere. Should we just go ahead and commit this patch?
          Hide
          Lohit Vijayarenu added a comment -

          i checked out trunk, applied this patch and ran 'ant test'
          apart from these two org.apache.hadoop.hbase.TestMergeMeta org.apache.hadoop.hbase.TestMergeTable all test passed.

          Show
          Lohit Vijayarenu added a comment - i checked out trunk, applied this patch and ran 'ant test' apart from these two org.apache.hadoop.hbase.TestMergeMeta org.apache.hadoop.hbase.TestMergeTable all test passed.
          Hide
          Arun C Murthy added a comment -

          Re-trying hudson...

          Show
          Arun C Murthy added a comment - Re-trying hudson...
          Hide
          Mukund Madhugiri added a comment -

          trying to trigger the patch process to pick it up, as i dont see it in the Q

          Show
          Mukund Madhugiri added a comment - trying to trigger the patch process to pick it up, as i dont see it in the Q
          Hide
          Lohit Vijayarenu added a comment -

          testing the streaming job again. This patch solves the problem seen earlier. Thanks!

          Show
          Lohit Vijayarenu added a comment - testing the streaming job again. This patch solves the problem seen earlier. Thanks!
          Hide
          Arun C Murthy added a comment -

          It seems like the test cases don't have a jar and hence there is an 'if' check in TaskTracker.localizeJob which fails and hence the "work" directory isn't created. This explains the exception seen in the TaskTracker.launchTaskForJob function.

          Here is patch which fixes TaskTracker.localizeJob to fix the problem described above, along with Amareshwari's original fix.

          Show
          Arun C Murthy added a comment - It seems like the test cases don't have a jar and hence there is an 'if' check in TaskTracker.localizeJob which fails and hence the "work" directory isn't created. This explains the exception seen in the TaskTracker.launchTaskForJob function. Here is patch which fixes TaskTracker.localizeJob to fix the problem described above, along with Amareshwari's original fix.
          Hide
          Arun C Murthy added a comment -

          Please ignore my previous comments... it's been a long day (maybe the following ones too! smile)

          It seems like the test cases don't have a jar and hence there is an 'if' check in TaskTracker.localizeJob which fails and hence the "work" directory isn't created. This explains the exception seen in the TaskTracker.launchTaskForJob function.

          I didn't make any headway after that...

          Show
          Arun C Murthy added a comment - Please ignore my previous comments... it's been a long day (maybe the following ones too! smile ) It seems like the test cases don't have a jar and hence there is an 'if' check in TaskTracker.localizeJob which fails and hence the "work" directory isn't created. This explains the exception seen in the TaskTracker.launchTaskForJob function. I didn't make any headway after that...
          Hide
          Arun C Murthy added a comment -

          Sigh, this exception seems to stem from the fact that the LocalDirAllocator is not used to create the taskTracker/jobcache/<jobid>/work directory at all. It is always created in the same partition as the taskTracker/jobcache/<jobid>/ directory.

          This means LocalDirAllocator doesn't know about the taskTracker/jobcache/<jobid>/work directory at all and hence the DiskErrorException.

          Show
          Arun C Murthy added a comment - Sigh, this exception seems to stem from the fact that the LocalDirAllocator is not used to create the taskTracker/jobcache/<jobid>/work directory at all. It is always created in the same partition as the taskTracker/jobcache/<jobid>/ directory. This means LocalDirAllocator doesn't know about the taskTracker/jobcache/<jobid>/work directory at all and hence the DiskErrorException.
          Hide
          Arun C Murthy added a comment -

          All tests fail with:

          2008-01-11 17:35:53,433 INFO  mapred.TaskTracker (TaskTracker.java:launchTaskForJob(703)) - org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200801111735_0001/work in any of the configured local directories
          	at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:359)
          	at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
          	at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.localizeTask(TaskTracker.java:1395)
          	at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.launchTask(TaskTracker.java:1469)
          	at org.apache.hadoop.mapred.TaskTracker.launchTaskForJob(TaskTracker.java:693)
          	at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:686)
          	at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1279)
          	at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:920)
          	at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1315)
          	at org.apache.hadoop.mapred.MiniMRCluster$TaskTrackerRunner.run(MiniMRCluster.java:144)
          	at java.lang.Thread.run(Thread.java:595)
          

          The problem is that the LocalDirAllocator.getLocalPathToRead throws and exception when the path is not found - this patch should handle that exception and go-ahead to create the symlink...

          Show
          Arun C Murthy added a comment - All tests fail with: 2008-01-11 17:35:53,433 INFO mapred.TaskTracker (TaskTracker.java:launchTaskForJob(703)) - org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200801111735_0001/work in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:359) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138) at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.localizeTask(TaskTracker.java:1395) at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.launchTask(TaskTracker.java:1469) at org.apache.hadoop.mapred.TaskTracker.launchTaskForJob(TaskTracker.java:693) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:686) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1279) at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:920) at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1315) at org.apache.hadoop.mapred.MiniMRCluster$TaskTrackerRunner.run(MiniMRCluster.java:144) at java.lang.Thread.run(Thread.java:595) The problem is that the LocalDirAllocator.getLocalPathToRead throws and exception when the path is not found - this patch should handle that exception and go-ahead to create the symlink...
          Hide
          Arun C Murthy added a comment -

          Too many core-tests failed, need to re-look this patch...

          Show
          Arun C Murthy added a comment - Too many core-tests failed, need to re-look this patch...
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12372956/patch-2570.txt
          against trunk revision r611056.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new compiler warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests -1. The patch failed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1543/testReport/
          Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1543/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1543/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1543/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12372956/patch-2570.txt against trunk revision r611056. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1543/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1543/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1543/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1543/console This message is automatically generated.
          Hide
          Lohit Vijayarenu added a comment -

          I tested this patch and it works for my earlier failing streaming job. Thanks!

          Show
          Lohit Vijayarenu added a comment - I tested this patch and it works for my earlier failing streaming job. Thanks!
          Hide
          Arun C Murthy added a comment -

          the 2 places where jobcache dir was used in streaming was to 'chmod' the executable and to lookup this directory in PATH. Would be it OK to construct jobCacheDir as done in HADOOP-2227 ?

          Lohit, that still won't help scripts which use "../work/<myscript>" - so this is the best approach for 0.15.3.

          In light of this bug HADOOP-2116 is a little more complicated than originally thought, I have a few thoughts about this which I'll put up there.

          Submiting the patch with symlinks to ../work from taskdir

          +1

          Show
          Arun C Murthy added a comment - the 2 places where jobcache dir was used in streaming was to 'chmod' the executable and to lookup this directory in PATH. Would be it OK to construct jobCacheDir as done in HADOOP-2227 ? Lohit, that still won't help scripts which use "../work/<myscript>" - so this is the best approach for 0.15.3. In light of this bug HADOOP-2116 is a little more complicated than originally thought, I have a few thoughts about this which I'll put up there. Submiting the patch with symlinks to ../work from taskdir +1
          Hide
          Amareshwari Sriramadasu added a comment -

          Submiting the patch with symlinks to ../work from taskdir

          Show
          Amareshwari Sriramadasu added a comment - Submiting the patch with symlinks to ../work from taskdir
          Hide
          Owen O'Malley added a comment -

          I agree with Milind that the best solution is probably a work symlink for 0.15.x and HADOOP-2116 for 0.16.x.

          Show
          Owen O'Malley added a comment - I agree with Milind that the best solution is probably a work symlink for 0.15.x and HADOOP-2116 for 0.16.x.
          Hide
          Nigel Daley added a comment -

          Is there a patch for this yet? Can we get it reviewed, Hudson'd and committed to trunk and branch-015 by EOD Jan 11.

          Show
          Nigel Daley added a comment - Is there a patch for this yet? Can we get it reviewed, Hudson'd and committed to trunk and branch-015 by EOD Jan 11.
          Hide
          Runping Qi added a comment -

          Lohit's suggestion should work.

          Show
          Runping Qi added a comment - Lohit's suggestion should work.
          Hide
          Milind Bhandarkar added a comment -

          If this issue goes into 0.15.3, then we should do what Arun proposed, IMO.

          In 0.16, when HADOOP-2116 is committed, the scripts will have job.local.dir config variable (exposed as an env var job_local_dir to streaming).

          Show
          Milind Bhandarkar added a comment - If this issue goes into 0.15.3, then we should do what Arun proposed, IMO. In 0.16, when HADOOP-2116 is committed, the scripts will have job.local.dir config variable (exposed as an env var job_local_dir to streaming).
          Hide
          Lohit Vijayarenu added a comment -

          the 2 places where jobcache dir was used in streaming was to 'chmod' the executable and to lookup this directory in PATH. Would be it OK to construct jobCacheDir as done in HADOOP-2227 ?

          Show
          Lohit Vijayarenu added a comment - the 2 places where jobcache dir was used in streaming was to 'chmod' the executable and to lookup this directory in PATH. Would be it OK to construct jobCacheDir as done in HADOOP-2227 ?
          Hide
          Arun C Murthy added a comment -

          Sigh, the only way is see is fix this post HADOOP-2227 is symlink the "work" directory from the partition on which the task's cwd is present; this is so because user scripts could just use "../work/" path and there is no way for us to pass them extra configuration parameters etc.

          Thoughts?

          Show
          Arun C Murthy added a comment - Sigh, the only way is see is fix this post HADOOP-2227 is symlink the "work" directory from the partition on which the task's cwd is present; this is so because user scripts could just use "../work/" path and there is no way for us to pass them extra configuration parameters etc. Thoughts?

            People

            • Assignee:
              Amareshwari Sriramadasu
              Reporter:
              Lohit Vijayarenu
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development