Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5399

Unnecessary Configuration instantiation in IFileInputStream slows down merge

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.1.0, 2.0.2-alpha
    • Fix Version/s: 2.1.0-beta
    • Component/s: mrv1, mrv2
    • Labels:
      None
    • Release Note:
      Fixes blank Configuration object creation overhead by reusing the Job configuration in InMemoryReader.
    • Target Version/s:

      Description

      We are using hadoop-2.0.0+1357-1.cdh4.3.0.p0.21 with MRv1. After upgrade from 4.1.2 to 4.3.0, I have noticed some performance deterioration in our MR job in the Reduce phase. The MR job has usually 10 000 map tasks (10 000 files on input each about 100MB) and 6 000 reducers (one reducer per table region). I was trying to figure out what at which phase the slow down appears (firstly I suspected that the slow gathering of the 10000 map output files is the culprit) and found out that the problem is not reading the map output (the shuffle) but the sort/merge phase that follows - the last and actual reduce phase is fast. I have tried to up the io.sort.factor because I thought the lots of small files are being merged on disk, but again upping that to 1000 didnt do any difference. I have then printed the stack trace and found out that the problem is initialization of the org.apache.hadoop.mapred.IFileInputStream namely the creation of the Configuration object which is not propagated along from earlier context, see the stack trace:

      Thread 13332: (state = IN_NATIVE)

      • java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise)
      • java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame)
      • java.io.File.exists() @bci=20, line=733 (Compiled frame)
      • sun.misc.URLClassPath$FileLoader.getResource(java.lang.String, boolean) @bci=136, line=999 (Compiled frame)
      • sun.misc.URLClassPath$FileLoader.findResource(java.lang.String, boolean) @bci=3, line=966 (Compiled frame)
      • sun.misc.URLClassPath.findResource(java.lang.String, boolean) @bci=17, line=146 (Compiled frame)
      • java.net.URLClassLoader$2.run() @bci=12, line=385 (Compiled frame)
      • java.security.AccessController.doPrivileged(java.security.PrivilegedAction, java.security.AccessControlContext) @bci=0 (Compiled frame)
      • java.net.URLClassLoader.findResource(java.lang.String) @bci=13, line=382 (Compiled frame)
      • java.lang.ClassLoader.getResource(java.lang.String) @bci=30, line=1002 (Compiled frame)
      • java.lang.ClassLoader.getResourceAsStream(java.lang.String) @bci=2, line=1192 (Compiled frame)
      • javax.xml.parsers.SecuritySupport$4.run() @bci=26, line=96 (Compiled frame)
      • java.security.AccessController.doPrivileged(java.security.PrivilegedAction) @bci=0 (Compiled frame)
      • javax.xml.parsers.SecuritySupport.getResourceAsStream(java.lang.ClassLoader, java.lang.String) @bci=10, line=89 (Compiled frame)
      • javax.xml.parsers.FactoryFinder.findJarServiceProvider(java.lang.String) @bci=38, line=250 (Interpreted frame)
      • javax.xml.parsers.FactoryFinder.find(java.lang.String, java.lang.String) @bci=273, line=223 (Interpreted frame)
      • javax.xml.parsers.DocumentBuilderFactory.newInstance() @bci=4, line=123 (Compiled frame)
      • org.apache.hadoop.conf.Configuration.loadResource(java.util.Properties, org.apache.hadoop.conf.Configuration$Resource, boolean) @bci=16, line=1890 (Compiled frame)
      • org.apache.hadoop.conf.Configuration.loadResources(java.util.Properties, java.util.ArrayList, boolean) @bci=49, line=1867 (Compiled frame)
      • org.apache.hadoop.conf.Configuration.getProps() @bci=43, line=1785 (Compiled frame)
      • org.apache.hadoop.conf.Configuration.get(java.lang.String) @bci=35, line=712 (Compiled frame)
      • org.apache.hadoop.conf.Configuration.getTrimmed(java.lang.String) @bci=2, line=731 (Compiled frame)
      • org.apache.hadoop.conf.Configuration.getBoolean(java.lang.String, boolean) @bci=2, line=1047 (Interpreted frame)
      • org.apache.hadoop.mapred.IFileInputStream.<init>(java.io.InputStream, long, org.apache.hadoop.conf.Configuration) @bci=111, line=93 (Interpreted frame)
      • org.apache.hadoop.mapred.IFile$Reader.<init>(org.apache.hadoop.conf.Configuration, org.apache.hadoop.fs.FSDataInputStream, long, org.apache.hadoop.io.compress.CompressionCodec, org.apache.hadoop.mapred.Counters$Counter) @bci=60, line=303 (Interpreted frame)
      • org.apache.hadoop.mapred.IFile$InMemoryReader.<init>(org.apache.hadoop.mapred.RamManager, org.apache.hadoop.mapred.TaskAttemptID, byte[], int, int) @bci=11, line=480 (Interpreted frame)
      • org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createInMemorySegments(java.util.List, long) @bci=133, line=2416 (Interpreted frame)
      • org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator() @bci=669, line=2530 (Interpreted frame)
      • org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=513, line=425 (Interpreted frame)
      • org.apache.hadoop.mapred.Child$4.run() @bci=29, line=268 (Interpreted frame)
      • java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame)
      • javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
      • org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1408 (Interpreted frame)
      • org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=776, line=262 (Interpreted frame)

      A blank configuration object is created at IFileInputStream. I have made a test and found out that this operation costs about 10-15ms depending on the load on the system, because it goes to the local FS to load the properties!!! This is to my opinion a bug since in the context the configuration (of the job) is known and could be reused at that point. My problem (and every others who has big number of reducer and mapper tasks) is that for 10K map taks it does 10000 x 15 = 150 seconds just to find out that there is nothing to sort. The overhead should be normally zero.

      At this moment, the 10-15ms problem is amplified by 6 000 reducers so the bottom line is that my reduce phase is at least 1.6 hours longer than it should be.

      1. MAPREDUCE-5399.patch
        5 kB
        Stanislav Barton

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1510 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1510/)
          MAPREDUCE-5399. Unnecessary Configuration instantiation in IFileInputStream slows down merge. (Stanislav Barton via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1510811)

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1510 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1510/ ) MAPREDUCE-5399 . Unnecessary Configuration instantiation in IFileInputStream slows down merge. (Stanislav Barton via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1510811 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #1483 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1483/)
          MAPREDUCE-5399. Unnecessary Configuration instantiation in IFileInputStream slows down merge. (Stanislav Barton via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1510811)

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1483 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1483/ ) MAPREDUCE-5399 . Unnecessary Configuration instantiation in IFileInputStream slows down merge. (Stanislav Barton via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1510811 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #293 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/293/)
          MAPREDUCE-5399. Unnecessary Configuration instantiation in IFileInputStream slows down merge. (Stanislav Barton via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1510811)

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #293 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/293/ ) MAPREDUCE-5399 . Unnecessary Configuration instantiation in IFileInputStream slows down merge. (Stanislav Barton via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1510811 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java
          Hide
          Arun C Murthy added a comment -

          Thanks Sandy. I'll close this and clone one for branch-1 fix to allow me to spin rc2 for hadoop-2. Ok?

          Show
          Arun C Murthy added a comment - Thanks Sandy. I'll close this and clone one for branch-1 fix to allow me to spin rc2 for hadoop-2. Ok?
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #4218 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4218/)
          MAPREDUCE-5399. Unnecessary Configuration instantiation in IFileInputStream slows down merge. (Stanislav Barton via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1510811)

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #4218 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4218/ ) MAPREDUCE-5399 . Unnecessary Configuration instantiation in IFileInputStream slows down merge. (Stanislav Barton via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1510811 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java
          Hide
          Sandy Ryza added a comment -

          I just committed this to trunk, branch-2, branch-2.1-beta, and branch-2.1.0-beta. Thanks Stanislav! Leaving this open for the branch-1 fix.

          Show
          Sandy Ryza added a comment - I just committed this to trunk, branch-2, branch-2.1-beta, and branch-2.1.0-beta. Thanks Stanislav! Leaving this open for the branch-1 fix.
          Hide
          Mayank Bansal added a comment -

          I am hitting this issue in Hadoop-1, Is anybody working on this one for Hadoop-1 patch?

          Thanks,
          Mayank

          Show
          Mayank Bansal added a comment - I am hitting this issue in Hadoop-1, Is anybody working on this one for Hadoop-1 patch? Thanks, Mayank
          Hide
          Arun C Murthy added a comment -

          Sandy Ryza Can you pls commit this to trunk, branch-2, branch-2.1-beta and branch-2.1.0-beta? Thanks!

          Show
          Arun C Murthy added a comment - Sandy Ryza Can you pls commit this to trunk, branch-2, branch-2.1-beta and branch-2.1.0-beta? Thanks!
          Hide
          Sandy Ryza added a comment -

          In that case I'm going to commit the trunk/branch-2 patch later today.

          Show
          Sandy Ryza added a comment - In that case I'm going to commit the trunk/branch-2 patch later today.
          Hide
          Jason Lowe added a comment -

          Upgrading this to a Blocker per an offline discussion with Arun C Murthy

          Show
          Jason Lowe added a comment - Upgrading this to a Blocker per an offline discussion with Arun C Murthy
          Hide
          Sandy Ryza added a comment -

          +1. Are you able to create a patch for branch-1 as well?

          Show
          Sandy Ryza added a comment - +1. Are you able to create a patch for branch-1 as well?
          Hide
          Stanislav Barton added a comment -

          I have patched (with a different patch) the distro we are using in the company. The idea was the same, I have replaced the Constructor with the proposed one at the InMemoryReader and it helped to cut the time spent sorting enormously (from 3mins to 3 seconds). I tried to simulate this on my local machine and the wordcount example but am having memory issues and it is not possible to change the distro here to test on the real cluster. So, provided that the configuration object is not null in the context of the call in this distro (it is not in the distro I am using) it will work as well here.

          Show
          Stanislav Barton added a comment - I have patched (with a different patch) the distro we are using in the company. The idea was the same, I have replaced the Constructor with the proposed one at the InMemoryReader and it helped to cut the time spent sorting enormously (from 3mins to 3 seconds). I tried to simulate this on my local machine and the wordcount example but am having memory issues and it is not possible to change the distro here to test on the real cluster. So, provided that the configuration object is not null in the context of the call in this distro (it is not in the distro I am using) it will work as well here.
          Hide
          Sandy Ryza added a comment -

          The patch looks good to me. It seems like this would be difficult to write a test for. Have you done any benchmarking to see if/how much it improves performance?

          Show
          Sandy Ryza added a comment - The patch looks good to me. It seems like this would be difficult to write a test for. Have you done any benchmarking to see if/how much it improves performance?
          Hide
          Stanislav Barton added a comment -

          In the proposed patch, I have replaced the constructor that allowed passing no Configuration object, then looked for all usages of the removed constructor and fixed the call by adding the Configuration object already known from the context. To my opinion, if the code compiles and the tests pass, it should be good to go, since the new constructor is not backwards compatible.

          Show
          Stanislav Barton added a comment - In the proposed patch, I have replaced the constructor that allowed passing no Configuration object, then looked for all usages of the removed constructor and fixed the call by adding the Configuration object already known from the context. To my opinion, if the code compiles and the tests pass, it should be good to go, since the new constructor is not backwards compatible.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12595426/MAPREDUCE-5399.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3926//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3926//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595426/MAPREDUCE-5399.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3926//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3926//console This message is automatically generated.
          Hide
          Jason Lowe added a comment -

          I don't see the trunk patch attached to this JIRA. Could you attach it? Once it's attached, the Jenkins bot will notice it and can comment on it, and others can review it as well.

          Usually we iterate on the trunk patch before worrying about other branches, because normally changes go through trunk and later versions before earlier versions. This helps prevent the situation where an older Hadoop version has the fix but it's missing from a later version. Often the trunk patch will apply to the 2.x line as-is, and we can work on the 1.x patch after iterating on the trunk version.

          Show
          Jason Lowe added a comment - I don't see the trunk patch attached to this JIRA. Could you attach it? Once it's attached, the Jenkins bot will notice it and can comment on it, and others can review it as well. Usually we iterate on the trunk patch before worrying about other branches, because normally changes go through trunk and later versions before earlier versions. This helps prevent the situation where an older Hadoop version has the fix but it's missing from a later version. Often the trunk patch will apply to the 2.x line as-is, and we can work on the 1.x patch after iterating on the trunk version.
          Hide
          Stanislav Barton added a comment -

          I have the patch for the TRUNK version (the build created 3.0.0-SNAPSHOT), but I checked out the code of the two proposed versions (1.1.0 and 2.0.2-a) and the patch for trunk is applicable only to the latter. Should I create the patch for the former as well?

          Show
          Stanislav Barton added a comment - I have the patch for the TRUNK version (the build created 3.0.0-SNAPSHOT), but I checked out the code of the two proposed versions (1.1.0 and 2.0.2-a) and the patch for trunk is applicable only to the latter. Should I create the patch for the former as well?
          Hide
          Mayank Bansal added a comment -

          Hi Stanislav Barton,

          Do you have the patch? As we are also hitting this issue and want to commit this asap?

          Thanks,
          Mayank

          Show
          Mayank Bansal added a comment - Hi Stanislav Barton , Do you have the patch? As we are also hitting this issue and want to commit this asap? Thanks, Mayank
          Hide
          Stanislav Barton added a comment -

          OK, I started preparing the patch for the trunk version, but the test take ages to finish. Hope to have it ready soon.

          Show
          Stanislav Barton added a comment - OK, I started preparing the patch for the trunk version, but the test take ages to finish. Hope to have it ready soon.
          Hide
          Karthik Kambatla added a comment -

          Stanislav Barton, are you planning to work on this? Otherwise, I am thinking of taking it up.

          Show
          Karthik Kambatla added a comment - Stanislav Barton , are you planning to work on this? Otherwise, I am thinking of taking it up.
          Hide
          Stanislav Barton added a comment -

          I have reviewed MAPREDUCE-4511 and it relates, the first version of the patch mentioned there would fix this bug (in case the conf == null it uses directly default values instead of creating a new conf object and try to read the defaults from there). However, I have fixed this on my side (using the sources of the Distro I am using) by introducing new constructor for InMemoryReader:

          public InMemoryReader(RamManager ramManager, TaskAttemptID taskAttemptId,
          byte[] data, int start, int length, JobConf conf)
          throws IOException {
          super(conf, null, length - start, null, null);
          LOG.info("Using job conf instead of creating new one");
          ...

          and consequently replacing the usage of the old constructor (without configuration) by this new one. I have deployed and it took effect and the time in reduce's sort dropped from 3-4 minutes to 0-5 seconds. I tried to find whether the old constructor was used elsewhere but it seems its not the case and the old one could be dropped. The read ahead is used on default in IFileInputStream.

          Show
          Stanislav Barton added a comment - I have reviewed MAPREDUCE-4511 and it relates, the first version of the patch mentioned there would fix this bug (in case the conf == null it uses directly default values instead of creating a new conf object and try to read the defaults from there). However, I have fixed this on my side (using the sources of the Distro I am using) by introducing new constructor for InMemoryReader: public InMemoryReader(RamManager ramManager, TaskAttemptID taskAttemptId, byte[] data, int start, int length, JobConf conf) throws IOException { super(conf, null, length - start, null, null); LOG.info("Using job conf instead of creating new one"); ... and consequently replacing the usage of the old constructor (without configuration) by this new one. I have deployed and it took effect and the time in reduce's sort dropped from 3-4 minutes to 0-5 seconds. I tried to find whether the old constructor was used elsewhere but it seems its not the case and the old one could be dropped. The read ahead is used on default in IFileInputStream.
          Hide
          Sandy Ryza added a comment -

          Stanislav Barton, are you planning on working on this?

          Show
          Sandy Ryza added a comment - Stanislav Barton , are you planning on working on this?
          Hide
          Jason Lowe added a comment -

          Appears to have been caused by MAPREDUCE-4511.

          Show
          Jason Lowe added a comment - Appears to have been caused by MAPREDUCE-4511 .
          Hide
          Jason Lowe added a comment -

          This is a problem in trunk and 2.x. Looking at IFileInputStream, it clearly creates a new Configuration object if null is passed in to the constructor, and InMemoryReader does exactly this.

          Show
          Jason Lowe added a comment - This is a problem in trunk and 2.x. Looking at IFileInputStream, it clearly creates a new Configuration object if null is passed in to the constructor, and InMemoryReader does exactly this.
          Hide
          Harsh J added a comment -

          As can be observed from the stack trace versus the 1.x code which is close to his used version, InMemoryFileReader on 1.x too loads up a null config object, causing the same issue (unless Configuration has itself improved recently in not always reloading defaults and overrides from disk).

          Show
          Harsh J added a comment - As can be observed from the stack trace versus the 1.x code which is close to his used version, InMemoryFileReader on 1.x too loads up a null config object, causing the same issue (unless Configuration has itself improved recently in not always reloading defaults and overrides from disk).
          Hide
          Hitesh Shah added a comment -

          If this is indeed an issue with Apache Hadoop-1.x, please feel free to file a jira with details specific to that. Issues with a particular vendor's distro should be redirected to the vendor in question.

          Show
          Hitesh Shah added a comment - If this is indeed an issue with Apache Hadoop-1.x, please feel free to file a jira with details specific to that. Issues with a particular vendor's distro should be redirected to the vendor in question.
          Hide
          Stanislav Barton added a comment -

          While working on the fix, I figured, that this bug might not apply on the official Apache Hadoop version, since the code differs a lot in the ReduceTask.run() method from the one used in Cloudera distribution. So please close the ticket as irrelevant, if this is the case.

          Show
          Stanislav Barton added a comment - While working on the fix, I figured, that this bug might not apply on the official Apache Hadoop version, since the code differs a lot in the ReduceTask.run() method from the one used in Cloudera distribution. So please close the ticket as irrelevant, if this is the case.

            People

            • Assignee:
              Stanislav Barton
              Reporter:
              Stanislav Barton
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development