Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.14.3
    • Fix Version/s: 0.17.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      All

    • Hadoop Flags:
      Incompatible change
    • Release Note:
      Hide
      This issue restructures local job directory on the tasktracker.
      Users are provided with a job-specific shared directory (mapred-local/taskTracker/jobcache/$jobid/ work) for using it as scratch space, through configuration property and system property "job.local.dir". Now, the directory "../work" is not available from the task's cwd.
      Show
      This issue restructures local job directory on the tasktracker. Users are provided with a job-specific shared directory (mapred-local/taskTracker/jobcache/$jobid/ work) for using it as scratch space, through configuration property and system property "job.local.dir". Now, the directory "../work" is not available from the task's cwd.

      Description

      Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

      1. patch-2116.txt
        33 kB
        Amareshwari Sriramadasu
      2. patch-2116.txt
        29 kB
        Amareshwari Sriramadasu
      3. patch-2116.txt
        6 kB
        Amareshwari Sriramadasu
      4. patch-2116.txt
        6 kB
        Amareshwari Sriramadasu
      5. patch-2116.txt
        7 kB
        Amareshwari Sriramadasu
      6. patch-2116.txt
        6 kB
        Amareshwari Sriramadasu

        Issue Links

          Activity

          Hide
          Konstantin Shvachko added a comment - - edited

          This is also practically fixed by HADOOP-2227. The only thing left is to expose the shared directory through the configuration.
          JobConf now has a property "mapred.jar" accessible through getJar() method, which points to the jar file located in the jobcache
          directory, which in fact is in the common shared directory for the job tasks.
          Namely,

          "mapred.jar" = "mapred.local.dir"[i]/taskTracker/jobcache/<job_id>/job.jar
          

          So we can replace configuration parameter "mapred.jar" by "job.local.dir", which will point to the parent of "mapred.jar".
          JobConf.getJar() can be implemented then as

          String getJar() {
              return get("job.local.dir") + "/job.jar";
          }
          

          Will that work?

          With respect to all the above I wonder why do we need to use LocalDirAllocator in TaskRunner.run()
          if job cache directory (jobCacheDir) can be obtained directly from TaskRunner.conf

          File jobCacheDir = new File(new File(conf.getJar()).getParentFile(), "work");
          
          Show
          Konstantin Shvachko added a comment - - edited This is also practically fixed by HADOOP-2227 . The only thing left is to expose the shared directory through the configuration. JobConf now has a property "mapred.jar" accessible through getJar() method, which points to the jar file located in the jobcache directory, which in fact is in the common shared directory for the job tasks. Namely, "mapred.jar" = "mapred.local.dir" [i]/taskTracker/jobcache/<job_id>/job.jar So we can replace configuration parameter "mapred.jar" by "job.local.dir", which will point to the parent of "mapred.jar". JobConf.getJar() can be implemented then as String getJar() { return get( "job.local.dir" ) + "/job.jar" ; } Will that work? With respect to all the above I wonder why do we need to use LocalDirAllocator in TaskRunner.run() if job cache directory (jobCacheDir) can be obtained directly from TaskRunner.conf File jobCacheDir = new File( new File(conf.getJar()).getParentFile(), "work" );
          Hide
          Amareshwari Sriramadasu added a comment -

          So we can replace configuration parameter "mapred.jar" by "job.local.dir", which will point to the parent of "mapred.jar".

          We cannot replace mapred.jar by job.local.dir because mapred.jar can be set and get by setJar() and getJar() from client side. For example, launchWordCount in TestMiniMRClassPath gives a different path for jar file.

          To expose the shared directory through the configuration, We can set
          localJobConf.set("job.local.dir", jobDir) in localizeJob()
          and job Cache directory can be obtained as
          File jobCacheDir = new File(new File(conf.get("job.local.dir")), "work");

          Show
          Amareshwari Sriramadasu added a comment - So we can replace configuration parameter "mapred.jar" by "job.local.dir", which will point to the parent of "mapred.jar". We cannot replace mapred.jar by job.local.dir because mapred.jar can be set and get by setJar() and getJar() from client side. For example, launchWordCount in TestMiniMRClassPath gives a different path for jar file. To expose the shared directory through the configuration, We can set localJobConf.set("job.local.dir", jobDir) in localizeJob() and job Cache directory can be obtained as File jobCacheDir = new File(new File(conf.get("job.local.dir")), "work");
          Hide
          Milind Bhandarkar added a comment -

          I would prefer separating the two. I.e. where job.jar goes, versus where the job.local.dir goes. Especially for streaming, where side-effect tasks are common, the mapper and reducer commands would need to have a clean directory (empty) where they can cache job-specific data (dictionaries downloaded off the network etc, that cannot be packaged as distributed archives). If job.jar also lives there, it might someday clash with the files downloaded, and cause issues.

          So, mapred.jar, jobCacheDir, and job.local.dir all need to be different locations.

          Is jobCacheDir available via a config variable ?

          Show
          Milind Bhandarkar added a comment - I would prefer separating the two. I.e. where job.jar goes, versus where the job.local.dir goes. Especially for streaming, where side-effect tasks are common, the mapper and reducer commands would need to have a clean directory (empty) where they can cache job-specific data (dictionaries downloaded off the network etc, that cannot be packaged as distributed archives). If job.jar also lives there, it might someday clash with the files downloaded, and cause issues. So, mapred.jar, jobCacheDir, and job.local.dir all need to be different locations. Is jobCacheDir available via a config variable ?
          Hide
          Amareshwari Sriramadasu added a comment -

          In the current state of art, jobCacheDir is "mapred/local/taskTracker/jobcache/<job_id>/work".
          I far as I understood, this needs to be accessible as "job.local.dir", a job-specific shared directory for use as scratch space.

          So, mapred.jar, jobCacheDir, and job.local.dir all need to be different locations.

          here, jobCacheDir (existing) would become job.local.dir now.

          If you want jobCacheDir to point to "mapred/local/taskTracker/jobcache/<job_id>/", this cannot be available via a config variable. Because it cannot take a unique value as it can be present in more than one disk. For example, we can have task directory ( mapred/local/taskTracker/jobcache/<job_id>/<taskid>) on a disk otherthan job.local.dir.

          Finally, we will have mapred.jar and job.local.dir (earlier jobCachedir) , both at different locations.

          Thoughts?

          Show
          Amareshwari Sriramadasu added a comment - In the current state of art, jobCacheDir is "mapred/local/taskTracker/jobcache/<job_id>/work". I far as I understood, this needs to be accessible as "job.local.dir", a job-specific shared directory for use as scratch space. So, mapred.jar, jobCacheDir, and job.local.dir all need to be different locations. here, jobCacheDir (existing) would become job.local.dir now. If you want jobCacheDir to point to "mapred/local/taskTracker/jobcache/<job_id>/", this cannot be available via a config variable. Because it cannot take a unique value as it can be present in more than one disk. For example, we can have task directory ( mapred/local/taskTracker/jobcache/<job_id>/<taskid>) on a disk otherthan job.local.dir. Finally, we will have mapred.jar and job.local.dir (earlier jobCachedir) , both at different locations. Thoughts?
          Hide
          Milind Bhandarkar added a comment -

          There are several advantages to have job.local.dir to be empty when the first task from that job starts on a tasktracker. (It would simplify the logic for user code to populate it with job-specific cached data that cannot use jobCache functionality.)

          That is why I suggest that mapred.jar, jobCacheDir, and job.local.dir all need to be different locations.

          Show
          Milind Bhandarkar added a comment - There are several advantages to have job.local.dir to be empty when the first task from that job starts on a tasktracker. (It would simplify the logic for user code to populate it with job-specific cached data that cannot use jobCache functionality.) That is why I suggest that mapred.jar, jobCacheDir, and job.local.dir all need to be different locations.
          Hide
          Amareshwari Sriramadasu added a comment -

          I propose the following the job cache directory structure to address the above needs:

          mapred/local/tasktracker/jobcache/<jobid>/
                                            --------> job_jar_xml/
                                                       ---------> job.jar
                                                       ---------> job.xml
                                                       ---------> unJarred directory
                                            --------> work/
                                            --------><taskdir>
          
          

          And we can have the directories job_jar_xml, work and taskdir on different disks.
          mapred/local/tasktracker/jobcache/<jobid>/job_jar_xml/job.jar is available via mapred.jar
          and mapred/local/tasktracker/jobcache/<jobid>/work is available via job.local.dir , which is an empty directory.

          Thoughts?

          Show
          Amareshwari Sriramadasu added a comment - I propose the following the job cache directory structure to address the above needs: mapred/local/tasktracker/jobcache/<jobid>/ --------> job_jar_xml/ ---------> job.jar ---------> job.xml ---------> unJarred directory --------> work/ --------><taskdir> And we can have the directories job_jar_xml, work and taskdir on different disks. mapred/local/tasktracker/jobcache/<jobid>/job_jar_xml/job.jar is available via mapred.jar and mapred/local/tasktracker/jobcache/<jobid>/work is available via job.local.dir , which is an empty directory. Thoughts?
          Hide
          Amareshwari Sriramadasu added a comment -

          Submitting the patch with the proposed approach.

          Show
          Amareshwari Sriramadasu added a comment - Submitting the patch with the proposed approach.
          Hide
          Milind Bhandarkar added a comment -

          Does this mean that all the taskdir will again use the same partition ?
          It would be opposite of HADOOP-2227, right ?
          Thats not good performance-wise too, since all tasks will be using the same spindle.

          Show
          Milind Bhandarkar added a comment - Does this mean that all the taskdir will again use the same partition ? It would be opposite of HADOOP-2227 , right ? Thats not good performance-wise too, since all tasks will be using the same spindle.
          Hide
          Devaraj Das added a comment -

          Does this mean that all the taskdir will again use the same partition ?

          No, the taskdir will be on different disks (using the LocalDirAllocator). The common directories for all tasks of a given job will the job.local.dir/jobCacheDir and the job_jar_xml (they will be configured/setup once per job using the LocalDirAllocator).

          Show
          Devaraj Das added a comment - Does this mean that all the taskdir will again use the same partition ? No, the taskdir will be on different disks (using the LocalDirAllocator). The common directories for all tasks of a given job will the job.local.dir/jobCacheDir and the job_jar_xml (they will be configured/setup once per job using the LocalDirAllocator).
          Hide
          Milind Bhandarkar added a comment -

          In that case, +1 for this approach !

          Show
          Milind Bhandarkar added a comment - In that case, +1 for this approach !
          Hide
          Amareshwari Sriramadasu added a comment -

          has to fix the streaming jobCacheDir.

          Show
          Amareshwari Sriramadasu added a comment - has to fix the streaming jobCacheDir.
          Hide
          Amareshwari Sriramadasu added a comment -

          Submiting again with fix for streaming and isolation runner.

          Show
          Amareshwari Sriramadasu added a comment - Submiting again with fix for streaming and isolation runner.
          Hide
          Lohit Vijayarenu added a comment -

          Hi Amareshwari,

          I tested this patch against trunk for resolution of HADOOP-2570. This solves the problem mentioned in HADOOP-2570. Should this patch be marked to go in 0.15.3 ?

          Thanks,
          Lohit

          Show
          Lohit Vijayarenu added a comment - Hi Amareshwari, I tested this patch against trunk for resolution of HADOOP-2570 . This solves the problem mentioned in HADOOP-2570 . Should this patch be marked to go in 0.15.3 ? Thanks, Lohit
          Hide
          Arun C Murthy added a comment -

          I light of HADOOP-2570, I'm cancelling this patch.

          Reasoning:

          The -file option works by putting the script into the job's jar file by unjar-ing, copying and then jar-ing it again. (yuck!)

          This means that on the TaskTracker the script has moved from jobCache/work to jobCache/job_jar_xml (I propose we rename that to private, heh). Clearly user-scripts which rely on "../work/<script_name>" will break again...

          Having said that we need to debate whether this feature is an incompatible-change, what do folks think?

          If people say otherwise we need to ensure all files in jobCache/private are smylinked into jobCache/work... ugh!


          I'd like to take this opportunity to take a hard look at streaming's -file option too. The unjar/jar way is completely backwards! We should rework the -file option to use the DistributedCache and the symlink option it provides.
          So, user-scripts can simply be "./<script>" rather than "../work/<script>". Yes, the way to maintain compatibility (if we want) is to use the previous option of symlinking files into jobCache/work also. I'd strongly vote for this option.

          Thoughts?

          Show
          Arun C Murthy added a comment - I light of HADOOP-2570 , I'm cancelling this patch. Reasoning: The -file option works by putting the script into the job's jar file by unjar-ing, copying and then jar-ing it again. (yuck!) This means that on the TaskTracker the script has moved from jobCache/work to jobCache/job_jar_xml (I propose we rename that to private , heh). Clearly user-scripts which rely on "../work/<script_name>" will break again... Having said that we need to debate whether this feature is an incompatible-change, what do folks think? If people say otherwise we need to ensure all files in jobCache/private are smylinked into jobCache/work... ugh! I'd like to take this opportunity to take a hard look at streaming's -file option too. The unjar/jar way is completely backwards! We should rework the -file option to use the DistributedCache and the symlink option it provides. So, user-scripts can simply be "./<script>" rather than "../work/<script>". Yes, the way to maintain compatibility (if we want) is to use the previous option of symlinking files into jobCache/work also. I'd strongly vote for this option. Thoughts?
          Hide
          Owen O'Malley added a comment - - edited

          Ugh is right.

          I'd propose some better names:

          $local/work/$jobid/
                 cache/               -- file cache
                 jars/                    -- expanded jar
                 job.xml               -- the generic job conf
                 $taskid/
                       job.xml        -- task localized job conf
                       output/         -- map outputs
                       work/            -- cwd for task
          

          with each of the leaf directories being placed independently on the partitions.

          We should define localized attributes to point to where each of the leaf directories is.

          I agree with Arun that we should re-work the -file option to use the file cache with symlinks.

          Show
          Owen O'Malley added a comment - - edited Ugh is right. I'd propose some better names: $local/work/$jobid/ cache/ -- file cache jars/ -- expanded jar job.xml -- the generic job conf $taskid/ job.xml -- task localized job conf output/ -- map outputs work/ -- cwd for task with each of the leaf directories being placed independently on the partitions. We should define localized attributes to point to where each of the leaf directories is. I agree with Arun that we should re-work the -file option to use the file cache with symlinks.
          Hide
          Devaraj Das added a comment -

          The only problem with this are the incompatible changes (like ../work and ../work/script); code, especially scripts that assume paths will break. So, is everyone okay with this for 0.16? Should we do the symlink stuff to maintain backward compatibility. As an aside, in the directory organization Owen suggested, one thing that needs to be added is the common scratch space for all tasks (like the file cache).

          Another thing IMO is that we should probably just do the basic dir organization as was proposed by Amareshwari earlier and the streaming fix. The magnitude of the change required by the dir organization proposed by Owen seems pretty significant and seems aggressive for 0.16. Maybe we can do the remaining for 0.17. Thoughts?

          Show
          Devaraj Das added a comment - The only problem with this are the incompatible changes (like ../work and ../work/script); code, especially scripts that assume paths will break. So, is everyone okay with this for 0.16? Should we do the symlink stuff to maintain backward compatibility. As an aside, in the directory organization Owen suggested, one thing that needs to be added is the common scratch space for all tasks (like the file cache). Another thing IMO is that we should probably just do the basic dir organization as was proposed by Amareshwari earlier and the streaming fix. The magnitude of the change required by the dir organization proposed by Owen seems pretty significant and seems aggressive for 0.16. Maybe we can do the remaining for 0.17. Thoughts?
          Hide
          Milind Bhandarkar added a comment -

          Since this bug is scheduled for 0.16, having incompatible changes in that release is fine (of course, as long as it is flagged such in the release notes.)

          Show
          Milind Bhandarkar added a comment - Since this bug is scheduled for 0.16, having incompatible changes in that release is fine (of course, as long as it is flagged such in the release notes.)
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12372896/patch-2116.txt
          against trunk revision r611361.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new compiler warnings.

          findbugs -1. The patch appears to introduce 1 new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests -1. The patch failed contrib unit tests.

          Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1552/testReport/
          Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1552/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1552/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1552/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12372896/patch-2116.txt against trunk revision r611361. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs -1. The patch appears to introduce 1 new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1552/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1552/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1552/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1552/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          I'd like to take this opportunity to take a hard look at streaming's -file option too. The unjar/jar way is completely backwards! We should rework the -file option to use the DistributedCache and the symlink option it provides.

          I created HADOOP-2622 to look at -file option.
          For 16.0, this issue will address the directory structure proposed earlier rather than eloberated structure proposed later.

          Show
          Amareshwari Sriramadasu added a comment - I'd like to take this opportunity to take a hard look at streaming's -file option too. The unjar/jar way is completely backwards! We should rework the -file option to use the DistributedCache and the symlink option it provides. I created HADOOP-2622 to look at -file option. For 16.0, this issue will address the directory structure proposed earlier rather than eloberated structure proposed later.
          Hide
          Amareshwari Sriramadasu added a comment -

          This patch has the empty work directory available as scratch space through environment variable "job.local.dir".
          The directory layout is as described earlier.
          I did thourough testing ; tested wordcount, sort and streaming job.

          Show
          Amareshwari Sriramadasu added a comment - This patch has the empty work directory available as scratch space through environment variable "job.local.dir". The directory layout is as described earlier. I did thourough testing ; tested wordcount, sort and streaming job.
          Hide
          Arun C Murthy added a comment -

          This patch sets a system property 'job.local.dir', I'm assuming that it is inherited by the children?

          +        System.setProperty("job.local.dir", workDir.toString());
          

          Even so, I think we should set a property in the JobConf to be consistent.


          Overall, I'm a little concerned that this is quite late (w.r.t 0.16.0) to be getting this in. I spoke to Milind and he is happy with the HADOOP-2570 (the symlink to ../work) - especially given the number of changes we need to make where we use something.getParent().{}. Hence I propose we push this to 0.17.0 and also make it a bigger change incorporating wider changes to the task's local directories proposed by Owen. Thoughts?

          Show
          Arun C Murthy added a comment - This patch sets a system property 'job.local.dir', I'm assuming that it is inherited by the children? + System.setProperty("job.local.dir", workDir.toString()); Even so, I think we should set a property in the JobConf to be consistent. Overall, I'm a little concerned that this is quite late (w.r.t 0.16.0) to be getting this in. I spoke to Milind and he is happy with the HADOOP-2570 (the symlink to ../work) - especially given the number of changes we need to make where we use something.getParent().{}. Hence I propose we push this to 0.17.0 and also make it a bigger change incorporating wider changes to the task's local directories proposed by Owen. Thoughts?
          Hide
          Milind Bhandarkar added a comment -

          As long as ../work works currently from task cwd to a shared job-specific directory, I am okay with punting this.
          So, +1.

          Show
          Milind Bhandarkar added a comment - As long as ../work works currently from task cwd to a shared job-specific directory, I am okay with punting this. So, +1.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12373478/patch-2116.txt
          against trunk revision r613115.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new compiler warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1644/testReport/
          Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1644/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1644/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1644/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12373478/patch-2116.txt against trunk revision r613115. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1644/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1644/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1644/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1644/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          A clarification regarding distributed cache:
          The current behavior of distributed cache is that the distributed cache is shared among the jobs. The cache is localized under mapred/local/tasktracker/archive. i.e If two jobs want to localize files with same name, they actually share them unless they have different file timestamps. Whenever a task releases cache, it decrements the reference count for the cache-id. Cache is cleaned up only when the cache size exceeds the allowed lize (local.cache.size).
          Is it the intended behavior, or should the cache be job specific? With the directory structure that Owen has suggested, it seems like cache should be job specific.

          Show
          Amareshwari Sriramadasu added a comment - A clarification regarding distributed cache: The current behavior of distributed cache is that the distributed cache is shared among the jobs. The cache is localized under mapred/local/tasktracker/archive. i.e If two jobs want to localize files with same name, they actually share them unless they have different file timestamps. Whenever a task releases cache, it decrements the reference count for the cache-id. Cache is cleaned up only when the cache size exceeds the allowed lize (local.cache.size). Is it the intended behavior, or should the cache be job specific? With the directory structure that Owen has suggested, it seems like cache should be job specific.
          Hide
          Owen O'Malley added a comment -

          You are right that the file cache is shared between jobs and that is the desired behavior. (Although it is fair to ask the question of whether that is the right policy once we have permissions. In general, probably not since it wouldn't be hard to create a file that looks like the desired one and get access to a file that you should have access to.)

          So what would you suggest for a layout?

          Show
          Owen O'Malley added a comment - You are right that the file cache is shared between jobs and that is the desired behavior. (Although it is fair to ask the question of whether that is the right policy once we have permissions. In general, probably not since it wouldn't be hard to create a file that looks like the desired one and get access to a file that you should have access to.) So what would you suggest for a layout?
          Hide
          Amareshwari Sriramadasu added a comment -

          I feel even with permissions, DistributedCache behavior should be the same. If the same user wants to share files across jobs, he should be allowed. And if he wants to share with other user who has permissions to access should be allowed. TaskTracker need not worry about the user permissions for localizing cache, those should be taken care in DistributedCache itself. Permissions aspect of DistributedCache has to be handled in a different JIRA.
          I propose the new layout would be the same as Owen suggested without filecache as part of job cache.
          So, it is

          mapred/local/taskTracker/jobcache/$jobid/
                                                 work/                  -- the scratch space
                                                 jars/                  -- expanded jar
                                                 job.xml                -- the generic job conf
                                                $taskid/
                                                     job.xml          -- task localized job conf
                                                     output/          -- map outputs
                                                     work/            -- cwd for task
          mapred/local/taskTracker/archive/   -- distributed cache
          

          Thoughts?

          Show
          Amareshwari Sriramadasu added a comment - I feel even with permissions, DistributedCache behavior should be the same. If the same user wants to share files across jobs, he should be allowed. And if he wants to share with other user who has permissions to access should be allowed. TaskTracker need not worry about the user permissions for localizing cache, those should be taken care in DistributedCache itself. Permissions aspect of DistributedCache has to be handled in a different JIRA. I propose the new layout would be the same as Owen suggested without filecache as part of job cache. So, it is mapred/local/taskTracker/jobcache/$jobid/ work/ -- the scratch space jars/ -- expanded jar job.xml -- the generic job conf $taskid/ job.xml -- task localized job conf output/ -- map outputs work/ -- cwd for task mapred/local/taskTracker/archive/ -- distributed cache Thoughts?
          Hide
          Amareshwari Sriramadasu added a comment -

          Here is patch with proposed design.
          I ran sort on 500 nodes. and also ran a streaming application on 10 nodes.
          Lohit, Can you also run your streaming application and verify if this patch is fine?

          Show
          Amareshwari Sriramadasu added a comment - Here is patch with proposed design. I ran sort on 500 nodes. and also ran a streaming application on 10 nodes. Lohit, Can you also run your streaming application and verify if this patch is fine?
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12376475/patch-2116.txt
          against trunk revision 619744.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 6 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1844/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1844/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1844/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1844/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12376475/patch-2116.txt against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1844/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1844/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1844/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1844/console This message is automatically generated.
          Hide
          Devaraj Das added a comment -

          Please add some documentation around job.local.dir.

          Show
          Devaraj Das added a comment - Please add some documentation around job.local.dir.
          Hide
          Amareshwari Sriramadasu added a comment -

          Added an api getJobLocalDir() in JobConf to get job.local.dir. Added javadoc.
          Added documentation in mapred_tutorial.xml

          Show
          Amareshwari Sriramadasu added a comment - Added an api getJobLocalDir() in JobConf to get job.local.dir. Added javadoc. Added documentation in mapred_tutorial.xml
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12378226/patch-2116.txt
          against trunk revision 619744.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 6 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2004/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2004/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2004/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2004/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378226/patch-2116.txt against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2004/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2004/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2004/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2004/console This message is automatically generated.
          Hide
          Devaraj Das added a comment -

          I just committed this. Thanks, Amareshwari!

          Show
          Devaraj Das added a comment - I just committed this. Thanks, Amareshwari!
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #434 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/434/ )
          Hide
          Devaraj Das added a comment -

          I committed this. Thanks, Amareshwari!

          Show
          Devaraj Das added a comment - I committed this. Thanks, Amareshwari!
          Hide
          Robert Chansler added a comment -

          Noted as incompatible in changes.txt

          Show
          Robert Chansler added a comment - Noted as incompatible in changes.txt

            People

            • Assignee:
              Amareshwari Sriramadasu
              Reporter:
              Milind Bhandarkar
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development