Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1700

User supplied dependencies may conflict with MapReduce system JARs

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.3-alpha
    • Component/s: task
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      If user code has a dependency on a version of a JAR that is different to the one that happens to be used by Hadoop, then it may not work correctly. This happened with user code using a different version of Avro, as reported here.

      The problem is analogous to the one that application servers have with WAR loading. Using a specialized classloader in the Child JVM is probably the way to solve this.

      1. MAPREDUCE-1700-ccl.patch
        17 kB
        Tom White
      2. MAPREDUCE-1700-ccl.patch
        29 kB
        Tom White
      3. MAPREDUCE-1700.patch
        42 kB
        Tom White
      4. MAPREDUCE-1700.patch
        38 kB
        Tom White
      5. MAPREDUCE-1700.patch
        29 kB
        Tom White
      6. MAPREDUCE-1700.patch
        24 kB
        Tom White
      7. MAPREDUCE-1700.patch
        25 kB
        Tom White
      8. MAPREDUCE-1700.patch
        26 kB
        Tom White
      9. MAPREDUCE-1700.patch
        27 kB
        Tom White
      10. MAPREDUCE-1700.patch
        16 kB
        Tom White

        Issue Links

          Activity

          Hide
          Scott Carey added a comment -

          I'm glad you filed this. I was just getting frustrated with this issue myself in the last couple weeks and have various thoughts on the issue. Some of these ideas are raw and flawed, but here is what I have been thinking:

          Ideally, the framework would limit the classes visible to a job to the minimum required for job execution. A job could then bring in its own dependencies. Also, if there was a built-in hadoop dependency hidden by default that a job wanted, it could request access to it.

          Similarly frustrating and related, is how a M/R job has to submit its whole job jar to the cluster each time. I have a 28MB jar, and a workflow of about 35 dependent M/R jobs (A DAG of them). Towards the end of this chain, the jobs get smaller and smaller in data size (the end ones are joining, augmenting, transforming and sorting data aggregated by the earlier jobs).
          Two big things account for more clock time than the 'heavy lifting' work of the initial 'big data' jobs – job submission time and scheduling inefficiencies. The former is related to dependency management.

          If the framework could support installing jars into an 'application' classloader space and then jobs reference that space, task latency could be reduced significantly as each job submission would not need to also submit all its dependency jars. In my case, the job jar would probably become a couple hundred K instead of almost 30MB – or even zero K if the jobs could just be stored and called. TaskTracker nodes could cache these application library spaces to reduce job start-up time.

          In some ways, the dependency management above is like an application server. Each 'application' has its own classloader space, and there might be several different jobs available in an 'application' – analogous to several servlets available in a web app. Like an app server, there will probably be a need for a lib directory that is global, one that is exclusive to the framework, and a per-application space.

          There are some questions related to static variables related to such classloader partitioning. With shared JVM's across tasks, users expect statics to live from one task to another in the same job. This means the classloader in a JVM corresponds with the Job ID and whether it is a M or R. Per-Job classloaders could enable JVM recycling across jobs in the distant future because disposing of a Job's classloader will free its static variables. That in turn leads to the possibility of future reductions in start-up time and per task costs.

          Show
          Scott Carey added a comment - I'm glad you filed this. I was just getting frustrated with this issue myself in the last couple weeks and have various thoughts on the issue. Some of these ideas are raw and flawed, but here is what I have been thinking: Ideally, the framework would limit the classes visible to a job to the minimum required for job execution. A job could then bring in its own dependencies. Also, if there was a built-in hadoop dependency hidden by default that a job wanted, it could request access to it. Similarly frustrating and related, is how a M/R job has to submit its whole job jar to the cluster each time. I have a 28MB jar, and a workflow of about 35 dependent M/R jobs (A DAG of them). Towards the end of this chain, the jobs get smaller and smaller in data size (the end ones are joining, augmenting, transforming and sorting data aggregated by the earlier jobs). Two big things account for more clock time than the 'heavy lifting' work of the initial 'big data' jobs – job submission time and scheduling inefficiencies. The former is related to dependency management. If the framework could support installing jars into an 'application' classloader space and then jobs reference that space, task latency could be reduced significantly as each job submission would not need to also submit all its dependency jars. In my case, the job jar would probably become a couple hundred K instead of almost 30MB – or even zero K if the jobs could just be stored and called. TaskTracker nodes could cache these application library spaces to reduce job start-up time. In some ways, the dependency management above is like an application server. Each 'application' has its own classloader space, and there might be several different jobs available in an 'application' – analogous to several servlets available in a web app. Like an app server, there will probably be a need for a lib directory that is global, one that is exclusive to the framework, and a per-application space. There are some questions related to static variables related to such classloader partitioning. With shared JVM's across tasks, users expect statics to live from one task to another in the same job. This means the classloader in a JVM corresponds with the Job ID and whether it is a M or R. Per-Job classloaders could enable JVM recycling across jobs in the distant future because disposing of a Job's classloader will free its static variables. That in turn leads to the possibility of future reductions in start-up time and per task costs.
          Hide
          Arun C Murthy added a comment -

          If the framework could support installing jars into an 'application' classloader space and then jobs reference that space, task latency could be reduced significantly as each job submission would not need to also submit all its dependency jars.

          Scott, this is precisely what the DistributedCache was designed for. Please load your jars to HDFS, add your jars to the DistributedCache and then they are 'localized' once per-tasktracker and all jobs can use the same:

          http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/filecache/DistributedCache.html

          Show
          Arun C Murthy added a comment - If the framework could support installing jars into an 'application' classloader space and then jobs reference that space, task latency could be reduced significantly as each job submission would not need to also submit all its dependency jars. Scott, this is precisely what the DistributedCache was designed for. Please load your jars to HDFS, add your jars to the DistributedCache and then they are 'localized' once per-tasktracker and all jobs can use the same: http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/filecache/DistributedCache.html
          Hide
          Scott Carey added a comment -

          The documentation for DistributedCache says:
          "Its efficiency stems from the fact that the files are only copied once per job and the ability to cache archives which are un-archived on the slaves."

          Is the documentation wrong? or the claim that the distribution happens one per tasktracker and multiple jobs can use it incorrect?
          The documentation above is ambiguous – does it copy items once per job, un-archiving once per slave per job? or does it cache un-archived data on slaves across a longer period of time?

          What I am suggesting is not a Job-scope cache, but something that has a much longer scope – days, weeks, months – to share between many different jobs without per job copying or unpacking unless the contents have changed. It is unclear from the documentation on DistributedCache if there is any optimization outside of the Job scope. If it had this sort of optimization already that would be great.

          Show
          Scott Carey added a comment - The documentation for DistributedCache says: "Its efficiency stems from the fact that the files are only copied once per job and the ability to cache archives which are un-archived on the slaves." Is the documentation wrong? or the claim that the distribution happens one per tasktracker and multiple jobs can use it incorrect? The documentation above is ambiguous – does it copy items once per job, un-archiving once per slave per job? or does it cache un-archived data on slaves across a longer period of time? What I am suggesting is not a Job-scope cache, but something that has a much longer scope – days, weeks, months – to share between many different jobs without per job copying or unpacking unless the contents have changed. It is unclear from the documentation on DistributedCache if there is any optimization outside of the Job scope. If it had this sort of optimization already that would be great.
          Hide
          Steve Loughran added a comment -

          Getting custom classloaders right is one of the hardest things to do in Java. Whoever volunteers to do this and I opt to run away from it had better talk to the experts in the area. If it is purely for short-lived standalone tasks things would be simpler (less classloader leakage risks), but you still have to be very good at handling the problems a CL tree brings to the table

          1. returning anything loaded by a custom CL means the CL and all loaded classes hang around in VM, reloading becomes tricky.
          2. multiple singletons in a single JVM
          3. object equality tests fail
          4. wierd errors that you had better log rather than hope don't happen.

          I've always felt that the ASF ought to have an "understands classloaders" qualification; if you don't pass it, you don't get to submit classloaders to the codebase.

          Show
          Steve Loughran added a comment - Getting custom classloaders right is one of the hardest things to do in Java. Whoever volunteers to do this and I opt to run away from it had better talk to the experts in the area. If it is purely for short-lived standalone tasks things would be simpler (less classloader leakage risks), but you still have to be very good at handling the problems a CL tree brings to the table returning anything loaded by a custom CL means the CL and all loaded classes hang around in VM, reloading becomes tricky. multiple singletons in a single JVM object equality tests fail wierd errors that you had better log rather than hope don't happen. I've always felt that the ASF ought to have an "understands classloaders" qualification; if you don't pass it, you don't get to submit classloaders to the codebase.
          Hide
          David Rosenstrauch added a comment -

          I also just ran into this issue. (Again, due to using a recent release of avro + jackson.)

          Is there any workaround for this? (Short of having to go into every node on the cluster and removing the jackson jar from the hadoop installation?

          Show
          David Rosenstrauch added a comment - I also just ran into this issue. (Again, due to using a recent release of avro + jackson.) Is there any workaround for this? (Short of having to go into every node on the cluster and removing the jackson jar from the hadoop installation?
          Hide
          Arun C Murthy added a comment -

          We have a patch for yahoo-hadoop-0.20 in MAPREDUCE-1938 to help solve this.

          Show
          Arun C Murthy added a comment - We have a patch for yahoo-hadoop-0.20 in MAPREDUCE-1938 to help solve this.
          Hide
          Henning Blohm added a comment -

          The patch in MAPREDUCE-1938 does unfortunately not solve the issue when the job implementation uses custom class loaders to load dependency classes. The proposed patch only addresses the issue when no custom class loaders are in the picture.

          As a first step, it would really help (us anyway), if jobs would not be started with stuff on the classpath that is not at all required for job execution per se (e.g. jetty libs, the eclipse java compiler, jasper...).

          Secondly, hadoop could actually start with only the hadoop api types on the classpath plus a small launcher that would load hadoops implementation in an isolated child loader, so that implementation dependencies do not leak through to the job's implementation. I am not sure if the hadoop implementation is ready for implementation/api separation via class loaders though.

          I patched hadoop 0.20.2 to exclude all libs in lib/server from the jobs classpath and I move all non-job related jars into that server folder in my hadoop installation. That helped somewhat.

          Show
          Henning Blohm added a comment - The patch in MAPREDUCE-1938 does unfortunately not solve the issue when the job implementation uses custom class loaders to load dependency classes. The proposed patch only addresses the issue when no custom class loaders are in the picture. As a first step, it would really help (us anyway), if jobs would not be started with stuff on the classpath that is not at all required for job execution per se (e.g. jetty libs, the eclipse java compiler, jasper...). Secondly, hadoop could actually start with only the hadoop api types on the classpath plus a small launcher that would load hadoops implementation in an isolated child loader, so that implementation dependencies do not leak through to the job's implementation. I am not sure if the hadoop implementation is ready for implementation/api separation via class loaders though. I patched hadoop 0.20.2 to exclude all libs in lib/server from the jobs classpath and I move all non-job related jars into that server folder in my hadoop installation. That helped somewhat.
          Hide
          Tom White added a comment -

          Here's a proof of concept for isolated classloaders in YARN. This approach uses OSGi for isolation. The idea is that the task JVM uses a Felix container to load the job JAR (which is an OSGi bundle) so that user code can use whichever libraries it likes, even if they conflict with system JARs.

          In this example I have created a fictitious library with two incompatible versions. Version 1 is used by the system (in YarnChild) while version 2 is used by the example Mapper. Without isolation the job fails with a java.lang.NoSuchMethodError - regardless of whether the user JARs are first or second on the classpath. When run using isolation, the job succeeds and we can see that both version 1 and version 2 of the library are used:

          /tmp/logs//application_1346151477167_0001/container_1346151477167_0001_01_000002/stdout:message 2
          /tmp/logs//application_1346151477167_0001/container_1346151477167_0001_01_000002/syslog:2012-08-28 11:58:52,317 INFO [main] org.apache.hadoop.mapred.YarnChild: message 1
          

          To run:

          • Checkout a revision of trunk that doesn't have MAPREDUCE-4068 ('svn up -r 1376252')
          • Apply the patch
          • Run 'mvn versions:set -DnewVersion=3.0.0' to change the version numbers to non-SNAPSHOT values, since OSGi doesn't like them.
          • Build:
            (cd hadoop-mapreduce-project/hadoop-mapreduce-examples/lib-v1; mvn install)
            (cd hadoop-mapreduce-project/hadoop-mapreduce-examples/lib-v2; mvn install)
            mvn clean install -DskipTests
            (cd hadoop-mapreduce-project/hadoop-mapreduce-examples/class-isolation-example/; mvn install)
            mvn package -Pdist -DskipTests -Dtar
            
          • Install the tarball and run
            bin/hadoop fs -mkdir -p input
            bin/hadoop fs -put /usr/share/dict/words input
            bin/hadoop jar ~/.m2/repository/org/apache/hadoop/class-isolation-example/1.0-SNAPSHOT/class-isolation-example-1.0-SNAPSHOT.jar org.apache.hadoop.examples.classisolation.Driver input output
            

          Still to do/future improvements:

          • Make compatible with MAPREDUCE-4068.
          • Write a unit test.
          • Currently only the Mapper is loaded using an OSGi service - extend the approach for all user-defined classes in a MR job.
          • Use OSGi fragments so that user job JARs don't need a Registrar class, since it would be a part of the host bundle that the job JAR extends.
          • Write a utility to convert existing job JARs to OSGi bundles (or fragments).
          Show
          Tom White added a comment - Here's a proof of concept for isolated classloaders in YARN. This approach uses OSGi for isolation. The idea is that the task JVM uses a Felix container to load the job JAR (which is an OSGi bundle) so that user code can use whichever libraries it likes, even if they conflict with system JARs. In this example I have created a fictitious library with two incompatible versions. Version 1 is used by the system (in YarnChild) while version 2 is used by the example Mapper. Without isolation the job fails with a java.lang.NoSuchMethodError - regardless of whether the user JARs are first or second on the classpath. When run using isolation, the job succeeds and we can see that both version 1 and version 2 of the library are used: /tmp/logs//application_1346151477167_0001/container_1346151477167_0001_01_000002/stdout:message 2 /tmp/logs//application_1346151477167_0001/container_1346151477167_0001_01_000002/syslog:2012-08-28 11:58:52,317 INFO [main] org.apache.hadoop.mapred.YarnChild: message 1 To run: Checkout a revision of trunk that doesn't have MAPREDUCE-4068 ('svn up -r 1376252') Apply the patch Run 'mvn versions:set -DnewVersion=3.0.0' to change the version numbers to non-SNAPSHOT values, since OSGi doesn't like them. Build: (cd hadoop-mapreduce-project/hadoop-mapreduce-examples/lib-v1; mvn install) (cd hadoop-mapreduce-project/hadoop-mapreduce-examples/lib-v2; mvn install) mvn clean install -DskipTests (cd hadoop-mapreduce-project/hadoop-mapreduce-examples/class-isolation-example/; mvn install) mvn package -Pdist -DskipTests -Dtar Install the tarball and run bin/hadoop fs -mkdir -p input bin/hadoop fs -put /usr/share/dict/words input bin/hadoop jar ~/.m2/repository/org/apache/hadoop/class-isolation-example/1.0-SNAPSHOT/class-isolation-example-1.0-SNAPSHOT.jar org.apache.hadoop.examples.classisolation.Driver input output Still to do/future improvements: Make compatible with MAPREDUCE-4068 . Write a unit test. Currently only the Mapper is loaded using an OSGi service - extend the approach for all user-defined classes in a MR job. Use OSGi fragments so that user job JARs don't need a Registrar class, since it would be a part of the host bundle that the job JAR extends. Write a utility to convert existing job JARs to OSGi bundles (or fragments).
          Hide
          Tom White added a comment -

          New patch with a unit test. The test isn't integrated into the build yet, so you have to build the class-isolation-example module manually first. I've also removed the fictitious libs and instead used Guava as an example of an incompatibility.

          Show
          Tom White added a comment - New patch with a unit test. The test isn't integrated into the build yet, so you have to build the class-isolation-example module manually first. I've also removed the fictitious libs and instead used Guava as an example of an incompatibility.
          Hide
          Arun C Murthy added a comment -

          Tom, I don't understand specific advantages of OSGI or Felix, so please pardon some of my questions.

          However, with MR being an application in YARN (see MAPREDUCE-4421) we can just add user jars in front of the classpath for the tasks (we already allow it). This isn't the same Map/Reduce child inherits the TT classpath problem in MR1 (actually even in MR1 you can put child jars ahead in the classpath for a long while now). Given this, do we need to bring in OSGI or Felix, what do else do they provide? Thanks.

          Show
          Arun C Murthy added a comment - Tom, I don't understand specific advantages of OSGI or Felix, so please pardon some of my questions. However, with MR being an application in YARN (see MAPREDUCE-4421 ) we can just add user jars in front of the classpath for the tasks (we already allow it). This isn't the same Map/Reduce child inherits the TT classpath problem in MR1 (actually even in MR1 you can put child jars ahead in the classpath for a long while now). Given this, do we need to bring in OSGI or Felix, what do else do they provide? Thanks.
          Hide
          Steve Loughran added a comment -

          Arun,

          I see where Tom is coming from. Irrespective of how the Hadoop services are deployed, you need to be able to do things like submit jobs from OSGi containers (e.g Spring & others) which is what this patch appears to offer. And if Oracle finally commit to OSGi now that Java 8 is being redefined, it'd be good from all clients.

          I would like to see a way to support this which doesn't put an OSGi JAR on the classpath of everything.

          Tom -is there a way to abstract away OSGi support so that it's optional, even if its a subclass of JobSubmitter? An org.apache.hadoop.mapreduce.osgi.OSGiJobSubmitter could override some new specific protect methods to enable this.

          Show
          Steve Loughran added a comment - Arun, I see where Tom is coming from. Irrespective of how the Hadoop services are deployed, you need to be able to do things like submit jobs from OSGi containers (e.g Spring & others) which is what this patch appears to offer. And if Oracle finally commit to OSGi now that Java 8 is being redefined, it'd be good from all clients. I would like to see a way to support this which doesn't put an OSGi JAR on the classpath of everything. Tom -is there a way to abstract away OSGi support so that it's optional, even if its a subclass of JobSubmitter? An org.apache.hadoop.mapreduce.osgi.OSGiJobSubmitter could override some new specific protect methods to enable this.
          Hide
          Scott Carey added a comment -

          Putting user jars before/after the application dependencies doesn't actually solve the problem.

          • The conflict might require a user jar that is not compatible with one needed by the framework, either order breaks something
          • The user might override a system jar and alter functionality in a way that breaks the framework, or subverts security.

          Both the host container and the user code need to be able to be certain of what code they are executing without stepping on each other's toes. This is not possible with one classpath.

          Show
          Scott Carey added a comment - Putting user jars before/after the application dependencies doesn't actually solve the problem. The conflict might require a user jar that is not compatible with one needed by the framework, either order breaks something The user might override a system jar and alter functionality in a way that breaks the framework, or subverts security. Both the host container and the user code need to be able to be certain of what code they are executing without stepping on each other's toes. This is not possible with one classpath.
          Hide
          Scott Carey added a comment -

          If we are lucky, projecct jigsaw will be pulled back into Java 8. According to: http://mreinhold.org/blog/late-for-the-train-qa it has not yet been decided.

          If it is brought back in, then perhaps we can wait until Java has a module system 1 to 1.5 years from now. If not, I do not think Hadoop can wait until Java 9, sometime 2015 to 2016 ish.

          Show
          Scott Carey added a comment - If we are lucky, projecct jigsaw will be pulled back into Java 8. According to: http://mreinhold.org/blog/late-for-the-train-qa it has not yet been decided. If it is brought back in, then perhaps we can wait until Java has a module system 1 to 1.5 years from now. If not, I do not think Hadoop can wait until Java 9, sometime 2015 to 2016 ish.
          Hide
          Tom White added a comment -

          Scott makes a good case for why some kind of classloader isolation is needed.

          The patch is still a work in progress, but the idea is that the OSGi support is optional - so if you use a regular (non-OSGi) job JAR then it works like it does today, while if your job JAR is an OSGi bundle (basically a JAR with extra headers in the manifest, and possibly some embedded dependencies) then it is loaded in an OSGi container in the task JVM. This allows folks who want to use OGSi to do so while not impacting others. (Hopefully this answers Steve's question.)

          From the point of view of this JIRA, OSGi is simply a means to ensure classloader isolation. That means that if Jigsaw became a reality, then we could use that instead or as well. OSGi has many other features, but they are not used for this change. (Note that there are other ongoing efforts to make Hadoop more OSGi-friendly, covered in HADOOP-7977, and while some might be helpful for this JIRA (such as HADOOP-6484), none is required.)

          Also, in the future OSGi containers could improve container reuse by providing better isolation between jobs, since bundles can be unloaded, although I haven't spent any time looking at how that would work in the context of MR.

          Show
          Tom White added a comment - Scott makes a good case for why some kind of classloader isolation is needed. The patch is still a work in progress, but the idea is that the OSGi support is optional - so if you use a regular (non-OSGi) job JAR then it works like it does today, while if your job JAR is an OSGi bundle (basically a JAR with extra headers in the manifest, and possibly some embedded dependencies) then it is loaded in an OSGi container in the task JVM. This allows folks who want to use OGSi to do so while not impacting others. (Hopefully this answers Steve's question.) From the point of view of this JIRA, OSGi is simply a means to ensure classloader isolation. That means that if Jigsaw became a reality, then we could use that instead or as well. OSGi has many other features, but they are not used for this change. (Note that there are other ongoing efforts to make Hadoop more OSGi-friendly, covered in HADOOP-7977 , and while some might be helpful for this JIRA (such as HADOOP-6484 ), none is required.) Also, in the future OSGi containers could improve container reuse by providing better isolation between jobs, since bundles can be unloaded, although I haven't spent any time looking at how that would work in the context of MR.
          Hide
          Luke Lu added a comment -

          The conflict might require a user jar that is not compatible with one needed by the framework, either order breaks something

          You can always change the client framework and make it work with user code, per job, with class path ordering. There is currently always a way in both Hadoop 1 and 2 to submit a job with arbitrary dependencies, even though it might not be pretty (may require change to client framework).

          The user might override a system jar and alter functionality in a way that breaks the framework, or subverts security.

          The client framework code can always be changed per job to accommodate new dependencies. MR security is done at protocol level, i.e. no amount class path ordering can subvert security.

          I agree with Arun that this is a nice to have feature to improve usability. Advanced users can already achieve whatever that can be achieved (including running an OSGi container) per job.

          Show
          Luke Lu added a comment - The conflict might require a user jar that is not compatible with one needed by the framework, either order breaks something You can always change the client framework and make it work with user code, per job, with class path ordering. There is currently always a way in both Hadoop 1 and 2 to submit a job with arbitrary dependencies, even though it might not be pretty (may require change to client framework). The user might override a system jar and alter functionality in a way that breaks the framework, or subverts security. The client framework code can always be changed per job to accommodate new dependencies. MR security is done at protocol level, i.e. no amount class path ordering can subvert security. I agree with Arun that this is a nice to have feature to improve usability. Advanced users can already achieve whatever that can be achieved (including running an OSGi container) per job.
          Hide
          Scott Carey added a comment -


          You can always change the client framework and make it work with user code, per job, with class path ordering. There is currently always a way in both Hadoop 1 and 2 to submit a job with arbitrary dependencies, even though it might not be pretty (may require change to client framework).

          Without a user doing classloader gymnasitics and fancy packaging themselves, there is not always a way. A user cannot simply package a jar up and ask hadoop to execute it and expose to the user's execution environment only the public Hadoop API.

          Show
          Scott Carey added a comment - You can always change the client framework and make it work with user code, per job, with class path ordering. There is currently always a way in both Hadoop 1 and 2 to submit a job with arbitrary dependencies, even though it might not be pretty (may require change to client framework). Without a user doing classloader gymnasitics and fancy packaging themselves, there is not always a way. A user cannot simply package a jar up and ask hadoop to execute it and expose to the user's execution environment only the public Hadoop API.
          Hide
          Luke Lu added a comment -

          Without a user doing classloader gymnasitics and fancy packaging themselves, there is not always a way.

          That's an interesting way to say that except for some ways that would always work, there is not always a way. Using the standard task API to bootstrap an OSGi container is reasonably straight forward

          A user cannot simply package a jar up and ask hadoop to execute it and expose to the user's execution environment only the public Hadoop API.

          I do agree that there is a usability issue for certain (and arguably less common) use cases, where a user wants to use dependencies that conflict with client framework. However the proposed OSGi approach makes the usability worse for common cases: You'll always need OSGi bundles, which is a form of "fancy packaging", to run your jobs.

          A more reasonable (and less heavy) solution would not require users to make any change (including adding metadata to their jars) to their existing code.

          Show
          Luke Lu added a comment - Without a user doing classloader gymnasitics and fancy packaging themselves, there is not always a way. That's an interesting way to say that except for some ways that would always work, there is not always a way. Using the standard task API to bootstrap an OSGi container is reasonably straight forward A user cannot simply package a jar up and ask hadoop to execute it and expose to the user's execution environment only the public Hadoop API. I do agree that there is a usability issue for certain (and arguably less common) use cases, where a user wants to use dependencies that conflict with client framework. However the proposed OSGi approach makes the usability worse for common cases: You'll always need OSGi bundles, which is a form of "fancy packaging", to run your jobs. A more reasonable (and less heavy) solution would not require users to make any change (including adding metadata to their jars) to their existing code.
          Hide
          Tom White added a comment -

          Prompted by this discussion I had a look at using a classloader approach similar to how servlet containers are implemented. The servlet spec says that classes in the WEB-INF/classes directory and JARs in the WEB-INF/lib directory are loaded in preference to system classes. I found this page about classloading in Jetty useful: http://docs.codehaus.org/display/JETTY/Classloading.

          The attached patch does a similar thing for the Hadoop task classpath by using a custom classloader for classes instantiated by reflection in MapTask. The unit test from the previous patch passes with this implementation. I think this is worth exploring further.

          Show
          Tom White added a comment - Prompted by this discussion I had a look at using a classloader approach similar to how servlet containers are implemented. The servlet spec says that classes in the WEB-INF/classes directory and JARs in the WEB-INF/lib directory are loaded in preference to system classes. I found this page about classloading in Jetty useful: http://docs.codehaus.org/display/JETTY/Classloading . The attached patch does a similar thing for the Hadoop task classpath by using a custom classloader for classes instantiated by reflection in MapTask. The unit test from the previous patch passes with this implementation. I think this is worth exploring further.
          Hide
          Luke Lu added a comment -

          I agree that the new (much) lighter weight approach is worth exploring. Thanks Tom!

          Show
          Luke Lu added a comment - I agree that the new (much) lighter weight approach is worth exploring. Thanks Tom!
          Hide
          Steve Loughran added a comment -

          no, we don't want to go anywhere near servlet classloaders, because you end up in WAR EAR and app server trees. The app server takes priority, except in the special case of JBoss in the past, which shared classes across webapps

          https://community.jboss.org/wiki/JBossClassLoaderHistory
          http://docs.jboss.org/jbossweb/2.1.x/class-loader-howto.html

          people will hit walls when they try to do things like upgrade the XML parser or try and add a new URL handler.

          I'll look at the patch, but classloaders are a mine of grief. That's the strength of OSGi: the grief is standardised an someone else has done the grief mining already

          Show
          Steve Loughran added a comment - no, we don't want to go anywhere near servlet classloaders, because you end up in WAR EAR and app server trees. The app server takes priority, except in the special case of JBoss in the past, which shared classes across webapps https://community.jboss.org/wiki/JBossClassLoaderHistory http://docs.jboss.org/jbossweb/2.1.x/class-loader-howto.html people will hit walls when they try to do things like upgrade the XML parser or try and add a new URL handler. I'll look at the patch, but classloaders are a mine of grief. That's the strength of OSGi: the grief is standardised an someone else has done the grief mining already
          Hide
          Luke Lu added a comment -

          There is no need to be scared of classloaders, especially for the simple "load only and then exit" scenarios that we're talking about. Most of the class loader issues stem from long running containers that need to dynamically load/unload classes. OSGi is an overkill for MR tasks. To be clear, I'm not anti-OSGi, which I think is perfectly fine for managing server-side plugins.

          Show
          Luke Lu added a comment - There is no need to be scared of classloaders, especially for the simple "load only and then exit" scenarios that we're talking about. Most of the class loader issues stem from long running containers that need to dynamically load/unload classes. OSGi is an overkill for MR tasks. To be clear, I'm not anti-OSGi, which I think is perfectly fine for managing server-side plugins.
          Hide
          Tom White added a comment -

          Most of the class loader issues stem from long running containers that need to dynamically load/unload classes.

          Also, the case we are talking about does not have the complex classloader trees that app servers have, so there are no sibling class sharing issues. In the task JVM there is only a single user app, so the classloader hierarchy is linear (boot, extension, system, job).

          There are a few cases where certain APIs make assumptions about which classloader to use:

          • The system classloader. For example, URL stream handlers are loaded by the classloader that loaded java.net.URL (boot), or the system classloader. So if a task registered a URL stream handler and it was in the job JAR, then it wouldn't be found since it was loaded by the job classloader, not the system classloader. In this case, the workaround is to implement a factory and call URL.setURLStreamHandlerFactory().
          • The caller's current classloader. For example, java.util.ResourceBundle uses the caller's current classloader, so if the framework tries to load a bundle then the bundle (e.g. a localization bundle) would not be found if it were in the job JAR, since the system classloader (which loaded the framework class) can't see the job classloader's classes. As it happens, MR counters use resource bundles; however, they explicitly use the context classloader, so this problem doesn't occur (see org.apache.hadoop.mapreduce.util.ResourceBundles). (Also, I imagine the use of resource bundles to localize counter names in the job JAR is very rare.)
          • The context classloader. For example, JAXP uses the context classloader to load the DocumentBuilderFactory specified in a system property. This case is covered by setting the context classloader to be the job classloader for the duration of the task (my latest patch does this). Most APIs that involve classloaders use the context classloader these days.

          So all of these cases can be handled. Also note that by default the job classloader is not used, to enable it you need to set mapreduce.job.isolated.classloader to true for your job.

          The latest patch handles the case of embedded lib and classes directories in the JAR, as well as distributed cache files and archives. The unit test passes (and fails with a NoSuchMethodError due to the class incompatibility if mapreduce.job.isolated.classloader is set to false). So I think it is pretty close now - the main thing left to do is sort out the build for the test, which relies on the MR examples module.

          Show
          Tom White added a comment - Most of the class loader issues stem from long running containers that need to dynamically load/unload classes. Also, the case we are talking about does not have the complex classloader trees that app servers have, so there are no sibling class sharing issues. In the task JVM there is only a single user app, so the classloader hierarchy is linear (boot, extension, system, job). There are a few cases where certain APIs make assumptions about which classloader to use: The system classloader . For example, URL stream handlers are loaded by the classloader that loaded java.net.URL (boot), or the system classloader. So if a task registered a URL stream handler and it was in the job JAR, then it wouldn't be found since it was loaded by the job classloader, not the system classloader. In this case, the workaround is to implement a factory and call URL.setURLStreamHandlerFactory(). The caller's current classloader . For example, java.util.ResourceBundle uses the caller's current classloader, so if the framework tries to load a bundle then the bundle (e.g. a localization bundle) would not be found if it were in the job JAR, since the system classloader (which loaded the framework class) can't see the job classloader's classes. As it happens, MR counters use resource bundles; however, they explicitly use the context classloader, so this problem doesn't occur (see org.apache.hadoop.mapreduce.util.ResourceBundles). (Also, I imagine the use of resource bundles to localize counter names in the job JAR is very rare.) The context classloader . For example, JAXP uses the context classloader to load the DocumentBuilderFactory specified in a system property. This case is covered by setting the context classloader to be the job classloader for the duration of the task (my latest patch does this). Most APIs that involve classloaders use the context classloader these days. So all of these cases can be handled. Also note that by default the job classloader is not used, to enable it you need to set mapreduce.job.isolated.classloader to true for your job. The latest patch handles the case of embedded lib and classes directories in the JAR, as well as distributed cache files and archives. The unit test passes (and fails with a NoSuchMethodError due to the class incompatibility if mapreduce.job.isolated.classloader is set to false). So I think it is pretty close now - the main thing left to do is sort out the build for the test, which relies on the MR examples module.
          Hide
          Alejandro Abdelnur added a comment -

          Nice.

          What I would add is the capability of blacklisting packages. This is, if a package is blacklisted and a class under that package hierarchy is found in the job JARs, the job should fail. This is something avail in webapp classloaders to avoid webapps for bundling things like servlet/jsp JARs that would break things. In our case we would blacklist common/hdfs/yarn/mapred packages and log4j (the factory is a singleton and if present in the job JARs will trash the log configuration of hadoop). I could see other JARs fitting this blacklist, thus I'd suggest that we have a config property with the list of blacklisted packages.

          This is isolating MR jobs from Hadoop JARs. I think we should do the same at YARN level to isolate YARN JARs from AM JARs. Because of this, the JobClassLoader should be in common and probably have a different name, like IsolationClassLoader. Also it should receive, in the constructor, the blacklist.

          Show
          Alejandro Abdelnur added a comment - Nice. What I would add is the capability of blacklisting packages. This is, if a package is blacklisted and a class under that package hierarchy is found in the job JARs, the job should fail. This is something avail in webapp classloaders to avoid webapps for bundling things like servlet/jsp JARs that would break things. In our case we would blacklist common/hdfs/yarn/mapred packages and log4j (the factory is a singleton and if present in the job JARs will trash the log configuration of hadoop). I could see other JARs fitting this blacklist, thus I'd suggest that we have a config property with the list of blacklisted packages. This is isolating MR jobs from Hadoop JARs. I think we should do the same at YARN level to isolate YARN JARs from AM JARs. Because of this, the JobClassLoader should be in common and probably have a different name, like IsolationClassLoader. Also it should receive, in the constructor, the blacklist.
          Hide
          Alejandro Abdelnur added a comment -

          Tom, one thing I've forgot to mention in my previous comment, we should see how to enable the classloader on the client side as well as it may be required (to use different JARs) for the submission code. May be as another JIRA.

          Also, don't recall now if it is there or not, we may want o have a job config property to disable it in case some app runs into funny issues with it.

          Show
          Alejandro Abdelnur added a comment - Tom, one thing I've forgot to mention in my previous comment, we should see how to enable the classloader on the client side as well as it may be required (to use different JARs) for the submission code. May be as another JIRA. Also, don't recall now if it is there or not, we may want o have a job config property to disable it in case some app runs into funny issues with it.
          Hide
          Tom White added a comment -

          Thanks for the comments Alejandro.

          What I would add is the capability of blacklisting packages.

          I think that is a good idea. Servlet containers do this - e.g. system classes in Jetty are always loaded from the parent (http://docs.codehaus.org/display/JETTY/Classloading). Rather than failing the job if the class is a system class and is found in the job classpath (as you suggested) I think it would be acceptable to log a warning but load from the system classpath. I expect the default blacklist would be java.,javax.,org.apache.commons.logging.,org.apache.log4j.,org.apache.hadoop..

          I think we should do the same at YARN level to isolate YARN JARs from AM JARs. Because of this, the JobClassLoader should be in common and probably have a different name, like IsolationClassLoader.

          Other YARN apps might benefit from this work, so perhaps we should add the classloader to YARN (not Common, since HDFS shouldn't need it), and the MR-specific parts would stay in MR, of course.

          we should see how to enable the classloader on the client side as well as it may be required (to use different JARs) for the submission code. May be as another JIRA.

          I think this is a slightly different problem, since users generally have more control over the JVM they submit from than the JVM the task runs in. So, yes, another JIRA would be appropriate.

          Also, don't recall now if it is there or not, we may want o have a job config property to disable it in case some app runs into funny issues with it.

          Agreed. It's off by default in the current patch.

          Show
          Tom White added a comment - Thanks for the comments Alejandro. What I would add is the capability of blacklisting packages. I think that is a good idea. Servlet containers do this - e.g. system classes in Jetty are always loaded from the parent ( http://docs.codehaus.org/display/JETTY/Classloading ). Rather than failing the job if the class is a system class and is found in the job classpath (as you suggested) I think it would be acceptable to log a warning but load from the system classpath. I expect the default blacklist would be java.,javax.,org.apache.commons.logging.,org.apache.log4j.,org.apache.hadoop. . I think we should do the same at YARN level to isolate YARN JARs from AM JARs. Because of this, the JobClassLoader should be in common and probably have a different name, like IsolationClassLoader. Other YARN apps might benefit from this work, so perhaps we should add the classloader to YARN (not Common, since HDFS shouldn't need it), and the MR-specific parts would stay in MR, of course. we should see how to enable the classloader on the client side as well as it may be required (to use different JARs) for the submission code. May be as another JIRA. I think this is a slightly different problem, since users generally have more control over the JVM they submit from than the JVM the task runs in. So, yes, another JIRA would be appropriate. Also, don't recall now if it is there or not, we may want o have a job config property to disable it in case some app runs into funny issues with it. Agreed. It's off by default in the current patch.
          Hide
          Tom White added a comment -

          New patch which moves the classloader to YARN (renamed ApplicationClassLoader), and adds ability to blacklist system classes, which are never loaded by the application classloader.

          Show
          Tom White added a comment - New patch which moves the classloader to YARN (renamed ApplicationClassLoader), and adds ability to blacklist system classes, which are never loaded by the application classloader.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12554482/MAPREDUCE-1700.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3046//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3046//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3046//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12554482/MAPREDUCE-1700.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3046//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3046//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3046//console This message is automatically generated.
          Hide
          Tom White added a comment -

          Address findbugs issue.

          Show
          Tom White added a comment - Address findbugs issue.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12554490/MAPREDUCE-1700.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3047//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3047//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12554490/MAPREDUCE-1700.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3047//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3047//console This message is automatically generated.
          Hide
          Steve Loughran added a comment -
          1. Task should get the string "APP_CLASSPATH" from the ApplicationConstants.
          2. the test dir logic won't work on windows if test.build.data isn't set :System.getProperty("test.build.data", "/tmp") -that default should be replaced with System.getProperty("java.io.tmpdir')
          3. in ApplicationClassLoader.loadClass() it looks to me like it is possible to have the situation c==null && ex==null at the if (c==null} throw ex; clause -if parent.loadClass() => null. Some check for a null ex value and setting to (something?) would avoid this.
          4. the tests should look for resource loading too, just to be thorough.

          Other than that, with my finite classloader knowledge -looks good. Someone who understands OSGi should do quick review too.

          Show
          Steve Loughran added a comment - Task should get the string "APP_CLASSPATH" from the ApplicationConstants . the test dir logic won't work on windows if test.build.data isn't set : System.getProperty("test.build.data", "/tmp") -that default should be replaced with System.getProperty("java.io.tmpdir') in ApplicationClassLoader.loadClass() it looks to me like it is possible to have the situation c==null && ex==null at the if (c==null} throw ex; clause -if parent.loadClass() => null . Some check for a null ex value and setting to (something?) would avoid this. the tests should look for resource loading too, just to be thorough. Other than that, with my finite classloader knowledge -looks good. Someone who understands OSGi should do quick review too.
          Hide
          Tom White added a comment -

          Thanks for the review Steve. I've addressed all your points in a new patch. (OSGi experts are also welcome to take a look of course!)

          Show
          Tom White added a comment - Thanks for the review Steve. I've addressed all your points in a new patch. (OSGi experts are also welcome to take a look of course!)
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12554512/MAPREDUCE-1700.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3049//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3049//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12554512/MAPREDUCE-1700.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3049//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3049//console This message is automatically generated.
          Hide
          Kihwal Lee added a comment -

          Now that we have a much better way of dealing with dependency conflicts, what will be the fate of "mapreduce.job.user.classpath.first" feature? Is there any use case where this feature works but the CCL approach don't or somehow is preferred over CCL for some reason? If none, shall we deprecate it?

          Show
          Kihwal Lee added a comment - Now that we have a much better way of dealing with dependency conflicts, what will be the fate of "mapreduce.job.user.classpath.first" feature? Is there any use case where this feature works but the CCL approach don't or somehow is preferred over CCL for some reason? If none, shall we deprecate it?
          Hide
          Tom White added a comment -

          I don't know of a reason that "mapreduce.job.user.classpath.first" would be preferable to CCL. However, I'd suggest waiting a release or so before deprecating it though, so we can see how CCL fares.

          Show
          Tom White added a comment - I don't know of a reason that "mapreduce.job.user.classpath.first" would be preferable to CCL. However, I'd suggest waiting a release or so before deprecating it though, so we can see how CCL fares.
          Hide
          Luke Lu added a comment -

          what will be the fate of "mapreduce.job.user.classpath.first" feature?

          I think we should still keep it as an "expert" feature, as it can be used to replace the implementation of the job/app classloader itself in rare cases. We probably should print a WARNING when the feature is used. The new job/app classloader behavior can be used as a much saner default.

          Show
          Luke Lu added a comment - what will be the fate of "mapreduce.job.user.classpath.first" feature? I think we should still keep it as an "expert" feature, as it can be used to replace the implementation of the job/app classloader itself in rare cases. We probably should print a WARNING when the feature is used. The new job/app classloader behavior can be used as a much saner default.
          Hide
          Kihwal Lee added a comment -

          Tom, one thing I've forgot to mention in my previous comment, we should see how to enable the classloader on the client side as well as it may be required (to use different JARs) for the submission code.

          I think this is a slightly different problem, since users generally have more control over the JVM they submit from than the JVM the task runs in. So, yes, another JIRA would be appropriate.

          I think AM also runs user code, if a custom output format is defined.

          Show
          Kihwal Lee added a comment - Tom, one thing I've forgot to mention in my previous comment, we should see how to enable the classloader on the client side as well as it may be required (to use different JARs) for the submission code. I think this is a slightly different problem, since users generally have more control over the JVM they submit from than the JVM the task runs in. So, yes, another JIRA would be appropriate. I think AM also runs user code, if a custom output format is defined.
          Hide
          Tom White added a comment -

          Kihwal, that's true - thanks for pointing it out. I've modified the patch to take care of that case, by setting the classloader for the MRAppMaster (when the configured of course).

          I've also created YARN-286 for the YARN part of this patch so it can be committed separately.

          This patch is a combined patch so that Jenkins can test it as a whole.

          Show
          Tom White added a comment - Kihwal, that's true - thanks for pointing it out. I've modified the patch to take care of that case, by setting the classloader for the MRAppMaster (when the configured of course). I've also created YARN-286 for the YARN part of this patch so it can be committed separately. This patch is a combined patch so that Jenkins can test it as a whole.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12562082/MAPREDUCE-1700.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3157//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3157//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-common.html
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3157//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562082/MAPREDUCE-1700.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3157//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3157//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-common.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3157//console This message is automatically generated.
          Hide
          Tom White added a comment -

          Updated patch to fix the findbugs issue and use code in YARN-286 which is now committed.

          Show
          Tom White added a comment - Updated patch to fix the findbugs issue and use code in YARN-286 which is now committed.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12563283/MAPREDUCE-1700.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3194//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3194//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563283/MAPREDUCE-1700.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3194//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3194//console This message is automatically generated.
          Hide
          Kihwal Lee added a comment -

          +1 The patch looks good. I hope people try this with many different use cases.

          Show
          Kihwal Lee added a comment - +1 The patch looks good. I hope people try this with many different use cases.
          Hide
          Tom White added a comment -

          I just committed this. Thanks to everyone who provided feedback.

          Show
          Tom White added a comment - I just committed this. Thanks to everyone who provided feedback.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3203 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3203/)
          MAPREDUCE-1700. User supplied dependencies may conflict with MapReduce system JARs. (Revision 1430929)

          Result = SUCCESS
          tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1430929
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3203 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3203/ ) MAPREDUCE-1700 . User supplied dependencies may conflict with MapReduce system JARs. (Revision 1430929) Result = SUCCESS tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1430929 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #92 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/92/)
          MAPREDUCE-1700. User supplied dependencies may conflict with MapReduce system JARs. (Revision 1430929)

          Result = SUCCESS
          tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1430929
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #92 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/92/ ) MAPREDUCE-1700 . User supplied dependencies may conflict with MapReduce system JARs. (Revision 1430929) Result = SUCCESS tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1430929 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1309 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1309/)
          MAPREDUCE-1700. User supplied dependencies may conflict with MapReduce system JARs. (Revision 1430929)

          Result = FAILURE
          tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1430929
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1309 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1309/ ) MAPREDUCE-1700 . User supplied dependencies may conflict with MapReduce system JARs. (Revision 1430929) Result = FAILURE tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1430929 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1281 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1281/)
          MAPREDUCE-1700. User supplied dependencies may conflict with MapReduce system JARs. (Revision 1430929)

          Result = FAILURE
          tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1430929
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1281 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1281/ ) MAPREDUCE-1700 . User supplied dependencies may conflict with MapReduce system JARs. (Revision 1430929) Result = FAILURE tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1430929 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          Hide
          Scott Carey added a comment -

          Awesome!

          Show
          Scott Carey added a comment - Awesome!
          Hide
          Kihwal Lee added a comment -

          Merged to branch-0.23.

          Show
          Kihwal Lee added a comment - Merged to branch-0.23.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #510 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/510/)
          merge -r 1430928:1430929 Merging MAPREDUCE-1700 to branch-0.23 (Revision 1440100)

          Result = SUCCESS
          kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1440100
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #510 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/510/ ) merge -r 1430928:1430929 Merging MAPREDUCE-1700 to branch-0.23 (Revision 1440100) Result = SUCCESS kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1440100 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          Hide
          Sangjin Lee added a comment -

          This is an awesome change. I've always wondered why this support wasn't there. Thanks!

          Show
          Sangjin Lee added a comment - This is an awesome change. I've always wondered why this support wasn't there. Thanks!
          Hide
          James Xu added a comment -

          Hi Tom, can you elaborate more on why "org.apache.commons.logging.,org.apache.log4j." should be blacklisted? we are trying to do the similar class loader thing for another software, so want to learn some experience here

          Show
          James Xu added a comment - Hi Tom, can you elaborate more on why "org.apache.commons.logging.,org.apache.log4j." should be blacklisted? we are trying to do the similar class loader thing for another software, so want to learn some experience here
          Hide
          Tom White added a comment -

          James Xu, the set of packages to blacklist came from the Jetty (http://docs.codehaus.org/display/JETTY/Classloading), which is why Commons Logging and Log4j were included. Excluding logging classes prevents inadvertant double initialization of the logging system - once when the task JVM starts and again when the user code is loaded. Note that you can change the system default by setting mapreduce.job.classloader.system.classes.

          Show
          Tom White added a comment - James Xu , the set of packages to blacklist came from the Jetty ( http://docs.codehaus.org/display/JETTY/Classloading ), which is why Commons Logging and Log4j were included. Excluding logging classes prevents inadvertant double initialization of the logging system - once when the task JVM starts and again when the user code is loaded. Note that you can change the system default by setting mapreduce.job.classloader.system.classes.

            People

            • Assignee:
              Tom White
              Reporter:
              Tom White
            • Votes:
              11 Vote for this issue
              Watchers:
              38 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development