Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3975

Default value not set for Configuration parameter mapreduce.job.local.dir

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.23.1, 0.23.2
    • Fix Version/s: 0.23.2
    • Component/s: mrv2
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Exporting mapreduce.job.local.dir for mapreduce tasks to use as job-level shared scratch space.

      Description

      mapreduce.job.local.dir (formerly job.local.dir in 0.20) is not set by default. This is a regression from 0.20.205.

      In 0.20.205, JobLocalizer.createWorkDir() constructs the "$mapred.local.dir/taskTracker/$user/jobcache/$jobid/work" path based on $user and $jobid, and then sets TaskTracker.JOB_LOCAL_DIR in the job's JobConf.

      So far, I haven't found where this is done in 0.23. It could be that this is what should be done by LocalJobRunner.setupChildMapredLocalDirs(), but I am still investigating.

        Activity

        Hide
        Arun C Murthy added a comment -

        Good catch Eric!

        We should fix YarnChild to set this for back-compat.

        Show
        Arun C Murthy added a comment - Good catch Eric! We should fix YarnChild to set this for back-compat.
        Hide
        Eric Payne added a comment -

        @Arun,

        I modified YarnChild.configureLocalDirs(...) to mimic what JobLocalizer.createWorkDir(...) did in 20.205. Namely, to use the LocalDirAllocator to get a local path within one of the "mapreduce.cluster.local.dir" locations and create a scratch directory under it.

        TESTING:
        On a 10-node secure cluster, I ran manual tests to make sure that the "mapreduce.job.local.dir" was being set and that the directory was being created.

        My manual tests printed the value of "job.local.dir" which showed up in the task logs. I also made sure that the scratch directories existed and were writeable by the user.

        The 20.205 value for "job.local.dir" looked something like this:
        $

        {mapred.local.dir[x]}

        /taskTracker/$user/jobcache/$jobid/work
        e.g.: /cluster/0/tmp/mapred-local/taskTracker/joe/jobcache/job_1234_0001/work
        The 23 value for "mapreduce.job.local.dir" looks something like this:
        $

        {mapreduce.cluster.local.dir[x]}

        /usercache/$user/appcache/$appid/work
        e.g.: /cluster/2/tmp/mapred-local/usercache/joe/appcache/application_5678_0002/work

        CONCERN:
        This solution works as far as it goes. However, I do have one concern.

        In 20.205, it appears that the "job.local.dir" has the same parent dir (e.g. /cluster/0) for all of the task attempts that are run on a specific node. However, this is not the case in 23. That is, on 23, even if two task attempts are run on the same node for application_5678_0002, they could have different root directories (e.g., one task attempt would have /cluster/2 for its root dir and the other would have /cluster/4).

        Is this expected behavior, or is YarnChild the wrong place to set this value?

        Show
        Eric Payne added a comment - @Arun, I modified YarnChild.configureLocalDirs(...) to mimic what JobLocalizer.createWorkDir(...) did in 20.205. Namely, to use the LocalDirAllocator to get a local path within one of the "mapreduce.cluster.local.dir" locations and create a scratch directory under it. TESTING: On a 10-node secure cluster, I ran manual tests to make sure that the "mapreduce.job.local.dir" was being set and that the directory was being created. My manual tests printed the value of "job.local.dir" which showed up in the task logs. I also made sure that the scratch directories existed and were writeable by the user. The 20.205 value for "job.local.dir" looked something like this: $ {mapred.local.dir[x]} /taskTracker/$user/jobcache/$jobid/work e.g.: /cluster/0/tmp/mapred-local/taskTracker/joe/jobcache/job_1234_0001/work The 23 value for "mapreduce.job.local.dir" looks something like this: $ {mapreduce.cluster.local.dir[x]} /usercache/$user/appcache/$appid/work e.g.: /cluster/2/tmp/mapred-local/usercache/joe/appcache/application_5678_0002/work CONCERN: This solution works as far as it goes. However, I do have one concern. In 20.205, it appears that the "job.local.dir" has the same parent dir (e.g. /cluster/0) for all of the task attempts that are run on a specific node. However, this is not the case in 23. That is, on 23, even if two task attempts are run on the same node for application_5678_0002, they could have different root directories (e.g., one task attempt would have /cluster/2 for its root dir and the other would have /cluster/4). Is this expected behavior, or is YarnChild the wrong place to set this value?
        Hide
        Robert Joseph Evans added a comment -

        I think that it is fine to have them in different base directories, if it becomes a problem in practice because tasks are trying to share data with each other, then it can be fixed then.

        Show
        Robert Joseph Evans added a comment - I think that it is fine to have them in different base directories, if it becomes a problem in practice because tasks are trying to share data with each other, then it can be fixed then.
        Hide
        Robert Joseph Evans added a comment -

        One minor nit. There is a java.util.Random added in that is not needed. I can fix it though as part of the commit. +1 assuming Jenkins comes back OK.

        Show
        Robert Joseph Evans added a comment - One minor nit. There is a java.util.Random added in that is not needed. I can fix it though as part of the commit. +1 assuming Jenkins comes back OK.
        Hide
        Robert Joseph Evans added a comment -

        Oh and I don't think we need the commented out code that used the Random. I can delete that too.

        Show
        Robert Joseph Evans added a comment - Oh and I don't think we need the commented out code that used the Random. I can delete that too.
        Hide
        Arun C Murthy added a comment -

        I think that it is fine to have them in different base directories, if it becomes a problem in practice because tasks are trying to share data with each other, then it can be fixed then.

        +1

        Show
        Arun C Murthy added a comment - I think that it is fine to have them in different base directories, if it becomes a problem in practice because tasks are trying to share data with each other, then it can be fixed then. +1
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12517331/MAPREDUCE-3975-1.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2013//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2013//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517331/MAPREDUCE-3975-1.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2013//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2013//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        Thanks Eric, I just put this in trunk 0.23 and 0.23.2.

        Show
        Robert Joseph Evans added a comment - Thanks Eric, I just put this in trunk 0.23 and 0.23.2.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Commit #641 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/641/)
        svn merge -c 1297825 from trunk to branch-0.23 FIXES: MAPREDUCE-3975. Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297826)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297826
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Commit #641 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/641/ ) svn merge -c 1297825 from trunk to branch-0.23 FIXES: MAPREDUCE-3975 . Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297826) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297826 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-0.23-Commit #650 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/650/)
        svn merge -c 1297825 from trunk to branch-0.23 FIXES: MAPREDUCE-3975. Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297826)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297826
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-0.23-Commit #650 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/650/ ) svn merge -c 1297825 from trunk to branch-0.23 FIXES: MAPREDUCE-3975 . Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297826) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297826 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #1846 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1846/)
        MAPREDUCE-3975. Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297825)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297825
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1846 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1846/ ) MAPREDUCE-3975 . Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297825) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297825 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #1920 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1920/)
        MAPREDUCE-3975. Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297825)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297825
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1920 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1920/ ) MAPREDUCE-3975 . Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297825) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297825 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #1854 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1854/)
        MAPREDUCE-3975. Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297825)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297825
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1854 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1854/ ) MAPREDUCE-3975 . Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297825) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297825 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-0.23-Commit #652 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/652/)
        svn merge -c 1297825 from trunk to branch-0.23 FIXES: MAPREDUCE-3975. Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297826)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297826
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Commit #652 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/652/ ) svn merge -c 1297825 from trunk to branch-0.23 FIXES: MAPREDUCE-3975 . Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297826) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297826 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #190 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/190/)
        svn merge -c 1297825 from trunk to branch-0.23 FIXES: MAPREDUCE-3975. Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297826)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297826
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #190 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/190/ ) svn merge -c 1297825 from trunk to branch-0.23 FIXES: MAPREDUCE-3975 . Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297826) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297826 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #977 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/977/)
        MAPREDUCE-3975. Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297825)

        Result = FAILURE
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297825
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #977 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/977/ ) MAPREDUCE-3975 . Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297825) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297825 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-0.23-Build #218 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/218/)
        svn merge -c 1297825 from trunk to branch-0.23 FIXES: MAPREDUCE-3975. Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297826)

        Result = FAILURE
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297826
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Build #218 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/218/ ) svn merge -c 1297825 from trunk to branch-0.23 FIXES: MAPREDUCE-3975 . Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297826) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297826 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1012 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1012/)
        MAPREDUCE-3975. Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297825)

        Result = FAILURE
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297825
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1012 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1012/ ) MAPREDUCE-3975 . Default value not set for Configuration parameter mapreduce.job.local.dir (Eric Payne via bobby) (Revision 1297825) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297825 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
        Hide
        Eric Payne added a comment -

        Thanks Bobby!

        Show
        Eric Payne added a comment - Thanks Bobby!
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Sorry for coming late on this:

        I think that it is fine to have them in different base directories, if it becomes a problem in practice because tasks are trying to share data with each other, then it can be fixed then.

        This is exactly what the job.local.dir is intended for - sharing of data across tasks on the node. If the tasks only need scratch space, they can write in ./tmp directory which is already created by nodemanager. Even our documentation for the 20.*/1.* release lines explicitly says this.

        We need to fix it. I am reopening this ticket.

        Show
        Vinod Kumar Vavilapalli added a comment - Sorry for coming late on this: I think that it is fine to have them in different base directories, if it becomes a problem in practice because tasks are trying to share data with each other, then it can be fixed then. This is exactly what the job.local.dir is intended for - sharing of data across tasks on the node. If the tasks only need scratch space, they can write in ./tmp directory which is already created by nodemanager. Even our documentation for the 20.* /1. * release lines explicitly says this. We need to fix it. I am reopening this ticket.
        Hide
        Eric Payne added a comment -

        Thanks Arun. Can you please give me pointers where this config property should be set in order to ensure that it is the same for different task attempts on the same node?

        Show
        Eric Payne added a comment - Thanks Arun. Can you please give me pointers where this config property should be set in order to ensure that it is the same for different task attempts on the same node?
        Hide
        Robert Joseph Evans added a comment -

        Vinod, can we open a new JIRA for this?

        The patch has already been merged in, and it is a step closer to what we want, although obviously not all the way there. If the directories are not the same the worst thing that happens is that some resources are wasted. There are no guarantees that the tasks will run on the same node, so any mapreduce jobs that use this need to be able to handle a task running by itself.

        Show
        Robert Joseph Evans added a comment - Vinod, can we open a new JIRA for this? The patch has already been merged in, and it is a step closer to what we want, although obviously not all the way there. If the directories are not the same the worst thing that happens is that some resources are wasted. There are no guarantees that the tasks will run on the same node, so any mapreduce jobs that use this need to be able to handle a task running by itself.
        Hide
        Eric Payne added a comment -

        s/Arun/Vinod/g

        Sorry about that.

        Show
        Eric Payne added a comment - s/Arun/Vinod/g Sorry about that.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        There are no guarantees that the tasks will run on the same node, so any mapreduce jobs that use this need to be able to handle a task running by itself.

        The feature helps to share resources on a node, whatever tasks run on this node can access this, more like the private distributed cache.

        can we open a new JIRA for this?

        Sure, let me close this and open a new ticket.

        Show
        Vinod Kumar Vavilapalli added a comment - There are no guarantees that the tasks will run on the same node, so any mapreduce jobs that use this need to be able to handle a task running by itself. The feature helps to share resources on a node, whatever tasks run on this node can access this, more like the private distributed cache. can we open a new JIRA for this? Sure, let me close this and open a new ticket.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Created MAPREDUCE-3988.

        Show
        Vinod Kumar Vavilapalli added a comment - Created MAPREDUCE-3988 .

          People

          • Assignee:
            Eric Payne
            Reporter:
            Eric Payne
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development