Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1114

Speed up ivy resolution in builds with clever caching

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.22.0
    • Fix Version/s: None
    • Component/s: build
    • Labels:
      None

      Description

      An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

      1. mapreduce-1114.txt
        23 kB
        Todd Lipcon
      2. mapreduce-1114.txt
        19 kB
        Todd Lipcon
      3. mapreduce-1114.txt
        23 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Konstantin Boudnik added a comment -

          Is there a JIRA tracking progress in removing ivy? If it's not happening in the near term, then something like the current patch may be worth keeping in trunk during interim 0.22 development.

          I know that folks here and there are eager to move to Maven, but I don't know how fast it might actually happen. This said I'm completely fine with having such short term solution in place.

          Show
          Konstantin Boudnik added a comment - Is there a JIRA tracking progress in removing ivy? If it's not happening in the near term, then something like the current patch may be worth keeping in trunk during interim 0.22 development. I know that folks here and there are eager to move to Maven, but I don't know how fast it might actually happen. This said I'm completely fine with having such short term solution in place.
          Hide
          Todd Lipcon added a comment -

          So without fundamentally changing how dependencies are resolved, attacking the problem as in the current patch is the only way to effect a meaningful reduction

          Yes, as far as I'm aware, that's the case. Thanks for the concise way of explaining it.

          To be completely honest, I'm nowhere near an expert in ant/ivy/maven/etc. I'm just a developer who doesn't use an IDE, and waiting 30-40 seconds every time I need to rerun a testcase got aggravating

          Is there a JIRA tracking progress in removing ivy?

          I'm not aware of any such.

          Show
          Todd Lipcon added a comment - So without fundamentally changing how dependencies are resolved, attacking the problem as in the current patch is the only way to effect a meaningful reduction Yes, as far as I'm aware, that's the case. Thanks for the concise way of explaining it. To be completely honest, I'm nowhere near an expert in ant/ivy/maven/etc. I'm just a developer who doesn't use an IDE, and waiting 30-40 seconds every time I need to rerun a testcase got aggravating Is there a JIRA tracking progress in removing ivy? I'm not aware of any such.
          Hide
          Chris Douglas added a comment -

          Some of these are within the same ant run, so they get cached. But 16 of them actually do some non-cached work [...]

          If I understand you correctly, the punchline is that improvements to intra-build caching are not only tedious, but also not a sound way of reducing the build time, as most classpaths are independently defined. So without fundamentally changing how dependencies are resolved, attacking the problem as in the current patch is the only way to effect a meaningful reduction. Is that the argument?

          fixing ivy doesn't make much sense - we'd be better off focusing on moving towards Maven.

          Is there a JIRA tracking progress in removing ivy? If it's not happening in the near term, then something like the current patch may be worth keeping in trunk during interim 0.22 development.

          Show
          Chris Douglas added a comment - Some of these are within the same ant run, so they get cached. But 16 of them actually do some non-cached work [...] If I understand you correctly, the punchline is that improvements to intra-build caching are not only tedious, but also not a sound way of reducing the build time, as most classpaths are independently defined. So without fundamentally changing how dependencies are resolved, attacking the problem as in the current patch is the only way to effect a meaningful reduction. Is that the argument? fixing ivy doesn't make much sense - we'd be better off focusing on moving towards Maven. Is there a JIRA tracking progress in removing ivy? If it's not happening in the near term, then something like the current patch may be worth keeping in trunk during interim 0.22 development.
          Hide
          Konstantin Boudnik added a comment -

          Well, build.xml has 7 'retrieves' in it. If you all add contrib to this it's gonna be total mess (e.g. 22 re-resolutions). IMO fixing ivy doesn't make much sense - we'd be better off focusing on moving towards Maven.

          Show
          Konstantin Boudnik added a comment - Well, build.xml has 7 'retrieves' in it. If you all add contrib to this it's gonna be total mess (e.g. 22 re-resolutions). IMO fixing ivy doesn't make much sense - we'd be better off focusing on moving towards Maven.
          Hide
          Todd Lipcon added a comment -

          The issue is that the build ends up spawning a lot of subants, which each re-resolve everything. I get a total of 22 ivy-resolves even if I skip contrib! Part of this is that skip.contrib=1 still resolves all of the contrib stuff (MAPREDUCE-1113)

          todd@todd-laptop:~/git/hadoop-mapreduce$ ant test -Dskip.contrib=1 -Dtestcase=xxx 2>&1 | grep 'ivy-resolve' | wc -l
          22
          todd@todd-laptop:~/git/hadoop-mapreduce$ ant test -Dskip.contrib=1 -Dtestcase=xxx 2>&1 | grep 'ivy-resolve-common' | wc -l
          19

          Some of these are within the same ant run, so they get cached. But 16 of them actually do some non-cached work:

          todd@todd-laptop:~/git/hadoop-mapreduce$ ant test -Dskip.contrib=1 -Dtestcase=xxx 2>&1 | grep 'resolving dependencies' | wc -l
          16

          Show
          Todd Lipcon added a comment - The issue is that the build ends up spawning a lot of subants, which each re-resolve everything. I get a total of 22 ivy-resolves even if I skip contrib! Part of this is that skip.contrib=1 still resolves all of the contrib stuff ( MAPREDUCE-1113 ) todd@todd-laptop:~/git/hadoop-mapreduce$ ant test -Dskip.contrib=1 -Dtestcase=xxx 2>&1 | grep 'ivy-resolve' | wc -l 22 todd@todd-laptop:~/git/hadoop-mapreduce$ ant test -Dskip.contrib=1 -Dtestcase=xxx 2>&1 | grep 'ivy-resolve-common' | wc -l 19 Some of these are within the same ant run, so they get cached. But 16 of them actually do some non-cached work: todd@todd-laptop:~/git/hadoop-mapreduce$ ant test -Dskip.contrib=1 -Dtestcase=xxx 2>&1 | grep 'resolving dependencies' | wc -l 16
          Hide
          Konstantin Boudnik added a comment -

          May be I'm barking on a wrong tree but I've tried to run a couple of commands in current MR trunk:

           % time ant ivy-resolve-common
          Buildfile: build.xml
          ...
          BUILD SUCCESSFUL
          Total time: 3 seconds
          
          real    0m4.513s
          user    0m5.186s
          sys     0m0.616s
          

          I got very close result for % time ant ivy-resolve-tree

          so it seems to me that resolver works pretty damn fast considering the number of artifacts it needs to check. May be the latency is hiding somewhere else?

          Show
          Konstantin Boudnik added a comment - May be I'm barking on a wrong tree but I've tried to run a couple of commands in current MR trunk: % time ant ivy-resolve-common Buildfile: build.xml ... BUILD SUCCESSFUL Total time: 3 seconds real 0m4.513s user 0m5.186s sys 0m0.616s I got very close result for % time ant ivy-resolve-tree so it seems to me that resolver works pretty damn fast considering the number of artifacts it needs to check. May be the latency is hiding somewhere else?
          Hide
          Todd Lipcon added a comment -

          Ivy already caches the resolves done in the same run, in theory, but there are a lot of "different" resolves, I think? The gain here is from caching between runs as you surmised.

          Show
          Todd Lipcon added a comment - Ivy already caches the resolves done in the same run, in theory, but there are a lot of "different" resolves, I think? The gain here is from caching between runs as you surmised.
          Hide
          Chris Douglas added a comment -

          I thought the bulk of the problem was re-resolving these properties during the same run. Is that mistaken? The current proposal also works across runs, which could be helpful, but again: maintaining the build is already a pain. Adding a cache to a bad idea is a well established software engineering practice, but I'd favor either fixing our use of ivy or replacing it if middling performance requires this.

          Show
          Chris Douglas added a comment - I thought the bulk of the problem was re-resolving these properties during the same run. Is that mistaken? The current proposal also works across runs, which could be helpful, but again: maintaining the build is already a pain. Adding a cache to a bad idea is a well established software engineering practice, but I'd favor either fixing our use of ivy or replacing it if middling performance requires this.
          Hide
          Todd Lipcon added a comment -

          When the classpath is resolved, it's written out to a text file named for that variable. Then when it needs to be resolved again, if that file exists, it's loaded rather than re-resolving.

          Show
          Todd Lipcon added a comment - When the classpath is resolved, it's written out to a text file named for that variable. Then when it needs to be resolved again, if that file exists, it's loaded rather than re-resolving.
          Hide
          Chris Douglas added a comment -

          Then I'm missing something. What is being "cached"?

          Show
          Chris Douglas added a comment - Then I'm missing something. What is being "cached"?
          Hide
          Todd Lipcon added a comment -

          Aren't the classpaths named? Would there be a way to short-circuit the resolution if it created/checked for a file mapped to that path?

          That is exactly what this patch does...

          Show
          Todd Lipcon added a comment - Aren't the classpaths named? Would there be a way to short-circuit the resolution if it created/checked for a file mapped to that path? That is exactly what this patch does...
          Hide
          Chris Douglas added a comment -

          Comparing the 15 second payoff to the full build time isn't particular important to me. For me, the ability to quickly iterate on code while recompiling and rerunning unit tests is the big payoff

          As a vi user, I got that. I haven't argued that the long build times are unimportant, but that a hack introducing a custom caching layer for classpaths is not, in my mind, a justifiable tradeoff in complexity. Maintaining black magic in the build is tedious and avoidable.

          the slowness is actually in the resolve task which generates the various classpath properties in ant

          Aren't the classpaths named? Would there be a way to short-circuit the resolution if it created/checked for a file mapped to that path?

          My most common development cycle is to run a single unit test. For Avro this takes just a few seconds, and I'm willing to wait without finding a new task to work on.

          As a workaround: depending on how often I'm running it, adding a main to the unit test is sometimes worthwhile.

          Show
          Chris Douglas added a comment - Comparing the 15 second payoff to the full build time isn't particular important to me. For me, the ability to quickly iterate on code while recompiling and rerunning unit tests is the big payoff As a vi user, I got that. I haven't argued that the long build times are unimportant, but that a hack introducing a custom caching layer for classpaths is not, in my mind, a justifiable tradeoff in complexity. Maintaining black magic in the build is tedious and avoidable. the slowness is actually in the resolve task which generates the various classpath properties in ant Aren't the classpaths named? Would there be a way to short-circuit the resolution if it created/checked for a file mapped to that path? My most common development cycle is to run a single unit test. For Avro this takes just a few seconds, and I'm willing to wait without finding a new task to work on. As a workaround: depending on how often I'm running it, adding a main to the unit test is sometimes worthwhile.
          Hide
          Todd Lipcon added a comment -

          Doug: the slowness is actually in the resolve task which generates the various classpath properties in ant. Without caching those properties to disk, there's no way to get around running ivy that I can think of. This patch essentially persists them to disk between runs, since the majority of the time they don't change.

          Show
          Todd Lipcon added a comment - Doug: the slowness is actually in the resolve task which generates the various classpath properties in ant. Without caching those properties to disk, there's no way to get around running ivy that I can think of. This patch essentially persists them to disk between runs, since the majority of the time they don't change.
          Hide
          Doug Cutting added a comment -

          > I look at this as a 60% speedup in my development cycle rather than a few % speedup in the full build.

          I agree with this logic. My most common development cycle is to run a single unit test. For Avro this takes just a few seconds, and I'm willing to wait without finding a new task to work on. With Hadoop this takes long enough that I switch to doing something else, lose my context, etc. Improving this significantly will significantly improve many developers productivity.

          I wonder if we can simply check if build/ivy/lib/Hadoop-Hdfs/

          {common,test}

          exist, and, if they do, assumes they're up-to-date, and only runs Ivy otherwise. Might that be simpler?

          Show
          Doug Cutting added a comment - > I look at this as a 60% speedup in my development cycle rather than a few % speedup in the full build. I agree with this logic. My most common development cycle is to run a single unit test. For Avro this takes just a few seconds, and I'm willing to wait without finding a new task to work on. With Hadoop this takes long enough that I switch to doing something else, lose my context, etc. Improving this significantly will significantly improve many developers productivity. I wonder if we can simply check if build/ivy/lib/Hadoop-Hdfs/ {common,test} exist, and, if they do, assumes they're up-to-date, and only runs Ivy otherwise. Might that be simpler?
          Hide
          Todd Lipcon added a comment -

          I don't think the 15 second payoff justifies the maintenance cost of a custom caching layer for ivy.

          Comparing the 15 second payoff to the full build time isn't particular important to me. For me, the ability to quickly iterate on code while recompiling and rerunning unit tests is the big payoff - so I look at this as a 60% speedup in my development cycle rather than a few % speedup in the full build.

          I may be in the minority, though, as I don't use eclipse or anything other fancy IDE that does incremental compilation.

          Anyone else care to chime in?

          Show
          Todd Lipcon added a comment - I don't think the 15 second payoff justifies the maintenance cost of a custom caching layer for ivy. Comparing the 15 second payoff to the full build time isn't particular important to me. For me, the ability to quickly iterate on code while recompiling and rerunning unit tests is the big payoff - so I look at this as a 60% speedup in my development cycle rather than a few % speedup in the full build. I may be in the minority, though, as I don't use eclipse or anything other fancy IDE that does incremental compilation. Anyone else care to chime in?
          Hide
          Chris Douglas added a comment -

          The patch is stale.

          The long build times are a problem and ivy's a big part of that, but I agree with your assessment: this is a hack. I don't think the 15 second payoff justifies the maintenance cost of a custom caching layer for ivy.

          Show
          Chris Douglas added a comment - The patch is stale. The long build times are a problem and ivy's a big part of that, but I agree with your assessment: this is a hack. I don't think the 15 second payoff justifies the maintenance cost of a custom caching layer for ivy.
          Hide
          Todd Lipcon added a comment -

          Forgot to include build-macros.xml in previous patch

          Show
          Todd Lipcon added a comment - Forgot to include build-macros.xml in previous patch
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12423234/mapreduce-1114.txt
          against trunk revision 829529.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          -1 findbugs. The patch appears to cause Findbugs to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/213/testReport/
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/213/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/213/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423234/mapreduce-1114.txt against trunk revision 829529. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/213/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/213/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/213/console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          Attaching up to date patch.

          Show
          Todd Lipcon added a comment - Attaching up to date patch.
          Hide
          Doug Cutting added a comment -

          > The uptodate check is due to avro-generate regenerating its .java output for every build.

          I filed AVRO-150 for this.

          Show
          Doug Cutting added a comment - > The uptodate check is due to avro-generate regenerating its .java output for every build. I filed AVRO-150 for this.
          Hide
          Todd Lipcon added a comment -

          SLF4J is missing some uptodate check which cause an unnecessary javac pass.

          Sorry, I misread some output there. The uptodate check is due to avro-generate regenerating its .java output for every build. I'll attack that in a later JIRA next time I get aggravated by slow builds

          Show
          Todd Lipcon added a comment - SLF4J is missing some uptodate check which cause an unnecessary javac pass. Sorry, I misread some output there. The uptodate check is due to avro-generate regenerating its .java output for every build. I'll attack that in a later JIRA next time I get aggravated by slow builds
          Hide
          Todd Lipcon added a comment -

          Attaching patch which speeds up a contrib-skipping null test case run from 32 seconds to 15 seconds wallclock.

          todd@todd-laptop:~/git/hadoop-mapreduce$ time ant test -Dskip.contrib=1 -Dtestcase=xxxx > /dev/null
          
          real    0m11.360s
          user    0m15.897s
          sys     0m1.436s
          todd@todd-laptop:~/git/hadoop-mapreduce$ git reset --hard 'HEAD^'
          HEAD is now at 8ac54e1 MAPREDUCE-1113. Fix mumak to not compile aspects with skip.contrib is set
          todd@todd-laptop:~/git/hadoop-mapreduce$ time ant test -Dskip.contrib=1 -Dtestcase=xxxx > /dev/null
          
          real    0m25.222s
          user    0m32.870s
          sys     0m1.948s
          

          There's still a bit of room to improve here - SLF4J is missing some uptodate check which cause an unnecessary javac pass.

          The way this works is that the macro writes the resolved classpath into build/ivy/ivy-resolve-cache/<project>-<conf> if it doesn't already exist. If that does exist, it loads that file into the appropriate properties as if ivy had done the resolution anew.

          This patch is not final quality - it outputs some debugging printouts and could be cleaned up a bit, just wanted to see what people thought before spending the time to make it prettier.

          You may wonder why this is starting as a MAPREDUCE patch and not HADOOP - no particular reason. If people like this I will do the same for common and HDFS.

          Show
          Todd Lipcon added a comment - Attaching patch which speeds up a contrib-skipping null test case run from 32 seconds to 15 seconds wallclock. todd@todd-laptop:~/git/hadoop-mapreduce$ time ant test -Dskip.contrib=1 -Dtestcase=xxxx > /dev/null real 0m11.360s user 0m15.897s sys 0m1.436s todd@todd-laptop:~/git/hadoop-mapreduce$ git reset --hard 'HEAD^' HEAD is now at 8ac54e1 MAPREDUCE-1113. Fix mumak to not compile aspects with skip.contrib is set todd@todd-laptop:~/git/hadoop-mapreduce$ time ant test -Dskip.contrib=1 -Dtestcase=xxxx > /dev/null real 0m25.222s user 0m32.870s sys 0m1.948s There's still a bit of room to improve here - SLF4J is missing some uptodate check which cause an unnecessary javac pass. The way this works is that the macro writes the resolved classpath into build/ivy/ivy-resolve-cache/<project>-<conf> if it doesn't already exist. If that does exist, it loads that file into the appropriate properties as if ivy had done the resolution anew. This patch is not final quality - it outputs some debugging printouts and could be cleaned up a bit, just wanted to see what people thought before spending the time to make it prettier. You may wonder why this is starting as a MAPREDUCE patch and not HADOOP - no particular reason. If people like this I will do the same for common and HDFS.

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:

                Development