Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6289

libjars are assumed to be in the DistributedCache but are never added in pseudo distributed mode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.6.0
    • None
    • distributed-cache
    • None

    Description

      Used version:

      $ hadoop version
      Hadoop 2.6.0
      Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
      Compiled by jenkins on 2014-11-13T21:10Z
      Compiled with protoc 2.5.0
      From source with checksum 18e43357c8f927c0695f1e9522859d6a
      This command was run using /usr/local/Cellar/hadoop/2.6.0/libexec/share/hadoop/common/hadoop-common-2.6.0.jar

      Having a pseudo-distributed setup with the worker node on the same machine as the master, no libjars are ever copied on the DFS but the following code assumes so resulting in FileNotFoundException s:

      The issue starts in org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(Job, Path, short) in this block:

          if (libjars != null) {
            FileSystem.mkdirs(jtFs, libjarsDir, mapredSysPerms);
            String[] libjarsArr = libjars.split(",");
            for (String tmpjars: libjarsArr) {
              Path tmp = new Path(tmpjars);
              Path newPath = copyRemoteFiles(libjarsDir, tmp, conf, replication);
              DistributedCache.addFileToClassPath(
                  new Path(newPath.toUri().getPath()), conf);
            }
          }
      1. For "local files" - as all libjars are in a pseudo-distributed setup copyRemoteFiles returns the path itself - including the file:// URI but not coping the file to the DFS itself.
      2. This URI is stripped while creating the new Path object.
      3. Within the DistributedCache, the "current filesystem" is used to restore the URI in org.apache.hadoop.mapreduce.filecache.DistributedCache.addFileToClassPath(Path, Configuration, FileSystem)
        • which is now the DFS itself.

      This causes i.e. file:/usr/local/Cellar/hbase/1.0.0/libexec/lib/hbase-client-1.0.0.jar to be added to the Distributed Cache as hdfs://localhost:8020/usr/local/Cellar/hbase/1.0.0/libexec/lib/hbase-client-1.0.0.jar - whereas it was never uploaded by copyRemoteFiles .

      During verification afterwards this causes a FileNotFoundException :

      15/03/23 21:08:42 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/Users/sme/development/hadoop-data/mapred/staging/sme1224959894/.staging/job_local1224959894_0001
      java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/usr/local/Cellar/hadoop/2.6.0/libexec/share/hadoop/common/lib/guava-11.0.2.jar
      	at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
      	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
      	at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
      	at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
      	at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
      	at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
      	at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:269)
      	at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
      	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
      	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
      	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
      	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
      	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
      	at com.sungard.advtech.bigtable.hbase.FixMessageMapOnlyImporter.main(FixMessageMapOnlyImporter.java:232)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:497)
      	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
      	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
      	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
      	at com.sungard.advtech.bigtable.hbase.HadoopDriver.main(HadoopDriver.java:19)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:497)
      	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
      	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sebastian.just@sungard.com Sebastian Just
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: