Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.0
    • Fix Version/s: 0.21.0
    • Component/s: conf
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      HADOOP-4187 introduced extra calls to Class.forName in ReflectionUtils.setConf. This caused a fairly large performance regression. Attached is a microbenchmark that shows the following timings (ms) for 100M constructions of new instances:

      Explicit construction (new Test): around ~1.6sec
      Using Test.class.newInstance: around ~2.6sec
      ReflectionUtils on 0.18.3: ~8.0sec
      ReflectionUtils on 0.20.0: ~200sec

      This illustrates the ~80x slowdown caused by HADOOP-4187.

      1. hadoop-6133-0.20.patch
        2 kB
        Todd Lipcon
      2. hadoop-6133-trunk.txt
        2 kB
        Todd Lipcon
      3. hadoop-6133-trunk.txt
        2 kB
        Todd Lipcon
      4. Test.java
        2 kB
        Todd Lipcon

        Issue Links

          Activity

          Todd Lipcon created issue -
          Todd Lipcon made changes -
          Field Original Value New Value
          Attachment Test.java [ 12412926 ]
          Hide
          Todd Lipcon added a comment -

          Here is one possible patch to fix this issue. The benchmark results in:

          ReflectionUtils on post-patch branch-0.20: ~18.1sec

          (still slower than 0.18.3 by about 2.5x but at least tolerable)

          This is not the most elegant fix, but the ClassLoader inside Configuration makes it slightly difficult to do this at a different layer.

          As for the importance of this - despite advice not to use ReflectionUtils in any hot path, there are cases when this happens. For example, MapWritable and GenericWritable do so for every deserialization. Outside libraries like Cascading also seem to not reuse objects in WritableDeserialization, and we have reports of some jobs spending nearly 100% CPU in Class.forName when profiled.

          This patch is against branch-0.20 but should also work post-split after the file rename.

          Show
          Todd Lipcon added a comment - Here is one possible patch to fix this issue. The benchmark results in: ReflectionUtils on post-patch branch-0.20: ~18.1sec (still slower than 0.18.3 by about 2.5x but at least tolerable) This is not the most elegant fix, but the ClassLoader inside Configuration makes it slightly difficult to do this at a different layer. As for the importance of this - despite advice not to use ReflectionUtils in any hot path, there are cases when this happens. For example, MapWritable and GenericWritable do so for every deserialization. Outside libraries like Cascading also seem to not reuse objects in WritableDeserialization, and we have reports of some jobs spending nearly 100% CPU in Class.forName when profiled. This patch is against branch-0.20 but should also work post-split after the file rename.
          Todd Lipcon made changes -
          Attachment hadoop-6133-0.20.patch [ 12412928 ]
          Todd Lipcon made changes -
          Affects Version/s 0.20.0 [ 12313438 ]
          Component/s conf [ 12310711 ]
          Hide
          Owen O'Malley added a comment -

          Have you tried just caching the JobConfigurable class in a static? That should substantially speed up the code rather than looking it up over and over again.

          Show
          Owen O'Malley added a comment - Have you tried just caching the JobConfigurable class in a static? That should substantially speed up the code rather than looking it up over and over again.
          Hide
          Todd Lipcon added a comment -

          Owen: the issue with that is that the JobConfigurable.class has to be loaded through Configuration's ClassLoader. In order to cache it in ReflectionUtils we'd need to have a Map<ClassLoader, Class> CACHE_JOBCONFIGURABLE_CLASSES as well. Otherwise we risk having multiple instances of the JobConfigurable.class and checking isAssignableFrom against the wrong one.

          Show
          Todd Lipcon added a comment - Owen: the issue with that is that the JobConfigurable.class has to be loaded through Configuration's ClassLoader. In order to cache it in ReflectionUtils we'd need to have a Map<ClassLoader, Class> CACHE_JOBCONFIGURABLE_CLASSES as well. Otherwise we risk having multiple instances of the JobConfigurable.class and checking isAssignableFrom against the wrong one.
          Hide
          Todd Lipcon added a comment -

          Here's the same patch against trunk

          Show
          Todd Lipcon added a comment - Here's the same patch against trunk
          Todd Lipcon made changes -
          Attachment hadoop-6133-trunk.txt [ 12416013 ]
          Todd Lipcon made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12416013/hadoop-6133-trunk.txt
          against trunk revision 802224.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/595/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/595/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/595/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/595/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416013/hadoop-6133-trunk.txt against trunk revision 802224. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/595/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/595/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/595/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/595/console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          No additional tests required - this code path is exercised widely by all parts of Hadoop

          Show
          Todd Lipcon added a comment - No additional tests required - this code path is exercised widely by all parts of Hadoop
          Hide
          Chris Douglas added a comment -

          Won't this prevent classes from being unloaded by retaining a reference to ClassLoaders in the cache? While HADOOP-4187 is explicitly temporary, this solution is unlikely to be removed with the cause of the regression.

          I can't think of any places where there would be fierce contention for an explicit lock on the cache requiring a ConcurrentHashMap for the outer map. Would a WeakHashMap suffice, here?

          Show
          Chris Douglas added a comment - Won't this prevent classes from being unloaded by retaining a reference to ClassLoaders in the cache? While HADOOP-4187 is explicitly temporary, this solution is unlikely to be removed with the cause of the regression. I can't think of any places where there would be fierce contention for an explicit lock on the cache requiring a ConcurrentHashMap for the outer map. Would a WeakHashMap suffice, here?
          Hide
          Todd Lipcon added a comment -

          I think Collections.synchronizedMap(WeakHashMap) should be fine. I'm out this week but can get to this next week, or consider this a +1 if you want to make the change.

          Show
          Todd Lipcon added a comment - I think Collections.synchronizedMap(WeakHashMap) should be fine. I'm out this week but can get to this next week, or consider this a +1 if you want to make the change.
          Chris Douglas made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Todd Lipcon added a comment -

          Here's an updated patch using WeakHashMaps instead of ConcurrentHashMaps

          Show
          Todd Lipcon added a comment - Here's an updated patch using WeakHashMaps instead of ConcurrentHashMaps
          Todd Lipcon made changes -
          Attachment hadoop-6133-trunk.txt [ 12418836 ]
          Todd Lipcon made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12418836/hadoop-6133-trunk.txt
          against trunk revision 812127.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/22/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/22/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/22/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/22/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12418836/hadoop-6133-trunk.txt against trunk revision 812127. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/22/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/22/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/22/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/22/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          +1

          I committed this. Thanks, Todd!

          Show
          Chris Douglas added a comment - +1 I committed this. Thanks, Todd!
          Chris Douglas made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #17 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/17/)
          . Add a caching layer to Configuration::getClassByName to
          alleviate a performance regression introduced in a compatibility layer.
          Contributed by Todd Lipcon

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #17 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/17/ ) . Add a caching layer to Configuration::getClassByName to alleviate a performance regression introduced in a compatibility layer. Contributed by Todd Lipcon
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #17 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/17/)
          . Add a caching layer to Configuration::getClassByName to
          alleviate a performance regression introduced in a compatibility layer.
          Contributed by Todd Lipcon

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #17 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/17/ ) . Add a caching layer to Configuration::getClassByName to alleviate a performance regression introduced in a compatibility layer. Contributed by Todd Lipcon
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #89 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/89/)
          . Add a caching layer to Configuration::getClassByName to
          alleviate a performance regression introduced in a compatibility layer.
          Contributed by Todd Lipcon

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #89 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/89/ ) . Add a caching layer to Configuration::getClassByName to alleviate a performance regression introduced in a compatibility layer. Contributed by Todd Lipcon
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #23 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/23/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #23 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/23/ )
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #5 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/5/)
          . Add a caching layer to Configuration::getClassByName to
          alleviate a performance regression introduced in a compatibility layer.
          Contributed by Todd Lipcon

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #5 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/5/ ) . Add a caching layer to Configuration::getClassByName to alleviate a performance regression introduced in a compatibility layer. Contributed by Todd Lipcon
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #21 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/21/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #21 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/21/ )
          Hide
          Allen Wittenauer added a comment -

          Anyone know off hand what branch this was committed to and can set the 'fix version' field?

          Show
          Allen Wittenauer added a comment - Anyone know off hand what branch this was committed to and can set the 'fix version' field?
          Hide
          Todd Lipcon added a comment -

          Looks to me like it was committed for 21 and 22

          Show
          Todd Lipcon added a comment - Looks to me like it was committed for 21 and 22
          Todd Lipcon made changes -
          Fix Version/s 0.21.0 [ 12313563 ]
          Fix Version/s 0.22.0 [ 12314296 ]
          Tom White made changes -
          Fix Version/s 0.22.0 [ 12314296 ]
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Robert Joseph Evans made changes -
          Link This issue breaks HADOOP-8632 [ HADOOP-8632 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          22d 23h 43m 1 Chris Douglas 02/Sep/09 01:21
          Open Open Patch Available Patch Available
          37d 19h 21m 2 Todd Lipcon 07/Sep/09 18:11
          Patch Available Patch Available Resolved Resolved
          4h 44m 1 Chris Douglas 07/Sep/09 22:55
          Resolved Resolved Closed Closed
          350d 22h 43m 1 Tom White 24/Aug/10 21:38

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development