Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.19.0
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Hadoop cluster of 5 servers, each with:
      HDD: two disks WDC WD1000FYPS-01ZKB0
      OS: Linux 2.6.26-1-686 #1 SMP
      FS: XFS

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Jars passed to the -libjars option of hadoop jars are no longer unpacked inside mapred.local.dir.

      Description

      I've noticed that task tracker moves all unpacked jars into
      $

      {hadoop.tmp.dir}

      /mapred/local/taskTracker.

      We are using a lot of external libraries, that are deployed via "-libjars"
      option. The total number of files after unpacking is about 20 thousands.

      After running a number of jobs, tasks start to be killed with timeout reason
      ("Task attempt_200901281518_0011_m_000173_2 failed to report status for 601
      seconds. Killing!"). All killed tasks are in "initializing" state. I've
      watched the tasktracker logs and found such messages:

      Thread 20926 (Thread-10368):
      State: BLOCKED
      Blocked count: 3611
      Waited count: 24
      Blocked on java.lang.ref.Reference$Lock@e48ed6
      Blocked by 20882 (Thread-10341)
      Stack:
      java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
      java.lang.StringCoding.encode(StringCoding.java:272)
      java.lang.String.getBytes(String.java:947)
      java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
      java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
      java.io.File.isDirectory(File.java:754)
      org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:427)
      org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
      org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
      org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
      org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
      org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
      org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
      org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
      org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
      org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
      org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
      org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
      org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
      org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)

      HADOOP-4780 patch brings the code which stores map of directories along
      with their DU's, thus reducing the number of calls to DU. However, the delete operation takes too long. I've manually deleted archive after 10 jobs had run and it took over 30 minutes on XFS.

      I suppose that an option to prohibit jars unpacking would be helpfull in my situation.

      1. hadoop-5175.txt
        0.6 kB
        Todd Lipcon

        Activity

        Hide
        Todd Lipcon added a comment -

        Agreed. We have seen this issue with the same root cause (lots of libjars makes job cleanup very slow).

        It seems to me that it's an oversight that JobClient.java calls DistributedCache.addArchiveToClassPath(...) for the libjars arguments. Instead, it should use DistributedCache.addFileToClassPath for jar files.

        Does anyone see any issue with that? In my opinion, libjars are explicitly supposed to stay self-contained - there's no reason to expand them.

        Show
        Todd Lipcon added a comment - Agreed. We have seen this issue with the same root cause (lots of libjars makes job cleanup very slow). It seems to me that it's an oversight that JobClient.java calls DistributedCache.addArchiveToClassPath(...) for the libjars arguments. Instead, it should use DistributedCache.addFileToClassPath for jar files. Does anyone see any issue with that? In my opinion, libjars are explicitly supposed to stay self-contained - there's no reason to expand them.
        Hide
        Todd Lipcon added a comment -

        One-liner patch which makes -libjars arguments not be expanded. Ran a test job which required two lib jars and it worked properly. Verified that they were not expanded in the temporary directory using "find" while the job was running.

        Show
        Todd Lipcon added a comment - One-liner patch which makes -libjars arguments not be expanded. Ran a test job which required two lib jars and it worked properly. Verified that they were not expanded in the temporary directory using "find" while the job was running.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12408202/hadoop-5175.txt
        against trunk revision 776032.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/349/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/349/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/349/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/349/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408202/hadoop-5175.txt against trunk revision 776032. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/349/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/349/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/349/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/349/console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        This is currently failing some contrib unit tests. Will submit a new patch when fixed.

        Show
        Todd Lipcon added a comment - This is currently failing some contrib unit tests. Will submit a new patch when fixed.
        Hide
        Todd Lipcon added a comment -

        Turns out the failing contrib tests are those fixed in HADOOP-5847. I believe this patch is good to go.

        Show
        Todd Lipcon added a comment - Turns out the failing contrib tests are those fixed in HADOOP-5847 . I believe this patch is good to go.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12408202/hadoop-5175.txt
        against trunk revision 777761.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/388/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/388/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/388/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/388/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408202/hadoop-5175.txt against trunk revision 777761. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/388/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/388/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/388/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/388/console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        Failing tests are unrelated still. I believe this is covered by existing tests which use DistributedCache. Should be good for commit.

        Show
        Todd Lipcon added a comment - Failing tests are unrelated still. I believe this is covered by existing tests which use DistributedCache. Should be good for commit.
        Hide
        Tom White added a comment -

        I've just committed this. Thanks Todd!

        Show
        Tom White added a comment - I've just committed this. Thanks Todd!
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #863 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/863/ )

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Andrew Gudkov
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development