Pig
  1. Pig
  2. PIG-3043

Modify the UrlClassloader in PigContext so that classes from the same classloader are used first instead of the parent

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      This behavior would be similar to what application servers do (Tomcat, Jetty, ...) and would allow classes from registered jars to use their own version of a class. It also avoid problems when adding a jar to pig break libraries that make use of dynamic class lookup.

      example of a common pattern that regularly is broken by the current mechanism:
      register lib.jar
      register my.jar
      define blah as my.UDF('my.Implementation')

      my.UDF is in my.jar and uses classes in lib.jar that use Class.forName() to resolve my.Implementation. It works fine until lib.jar is added as a dependency of pig or in the PIG_CLASSPATH. Then classes in lib.jar do not see the classes in registered jars.

      I thinks that overriding loadClass(String name, boolean resolve) would allow doing that.
      We should make an exception for anything in org.apache.pig just like servlet.jar is excluded in app servers.

        Issue Links

          Activity

          Julien Le Dem created issue -
          Hide
          Rohini Palaniswamy added a comment -

          Julien,
          You hit errors in the frontend or in the backend?

          Show
          Rohini Palaniswamy added a comment - Julien, You hit errors in the frontend or in the backend?
          Hide
          Julien Le Dem added a comment -

          In the front end.
          In the backend everything is in the same class loader.

          Julien

          Show
          Julien Le Dem added a comment - In the front end. In the backend everything is in the same class loader. Julien
          Hide
          Rohini Palaniswamy added a comment -

          Asked in case you are attempting to achieve something like that in the backend too. That did not seem to be the intention but wanted to confirm. In PIG-3039, I put up a patch which gives preference to the registered jars while shipping jars to the job. That would conflict in case you were trying something like that.

          Just FYI. It is possible to do class loaders at the backend too. Instead of archive you can ship the jar as a file without adding to class path and then use a URLClassloader to put them in classpath. Have tried that in the past to load a different version of Hadoop jar in h20. But anyways with the way classpath handling is done in h23(wildcard inclusion of jars) that approach will not work.

          Show
          Rohini Palaniswamy added a comment - Asked in case you are attempting to achieve something like that in the backend too. That did not seem to be the intention but wanted to confirm. In PIG-3039 , I put up a patch which gives preference to the registered jars while shipping jars to the job. That would conflict in case you were trying something like that. Just FYI. It is possible to do class loaders at the backend too. Instead of archive you can ship the jar as a file without adding to class path and then use a URLClassloader to put them in classpath. Have tried that in the past to load a different version of Hadoop jar in h20. But anyways with the way classpath handling is done in h23(wildcard inclusion of jars) that approach will not work.
          Hide
          Julien Le Dem added a comment -

          I tried the Classloader in the backend in the past and ended up reverting it because of similar issues to the one described in this ticket. For example, a Loader (in a registered jar) not seeing files from the registered jar because the Configuration object uses getResourceAsStream on its own classloader. Having 2 classloaders where there was 1 has side effects.
          see: https://issues.apache.org/jira/browse/PIG-2318

          Show
          Julien Le Dem added a comment - I tried the Classloader in the backend in the past and ended up reverting it because of similar issues to the one described in this ticket. For example, a Loader (in a registered jar) not seeing files from the registered jar because the Configuration object uses getResourceAsStream on its own classloader. Having 2 classloaders where there was 1 has side effects. see: https://issues.apache.org/jira/browse/PIG-2318
          Hide
          Julien Le Dem added a comment -

          Rohini, I just looked at your patch in PIG-3039 and I think it is compatible with what I am suggesting here. register jars take precedence in the frontend and the backend, even if the mechanism for this differs.

          Show
          Julien Le Dem added a comment - Rohini, I just looked at your patch in PIG-3039 and I think it is compatible with what I am suggesting here. register jars take precedence in the frontend and the backend, even if the mechanism for this differs.
          Julien Le Dem made changes -
          Field Original Value New Value
          Link This issue relates to PIG-3039 [ PIG-3039 ]
          Hide
          Rohini Palaniswamy added a comment -

          That's right. Thanks for taking a look Julien. My concern was whether you wanted to keep both versions of the jar in the backend too as my patch just ships one version to the backend in case of packages inside pig.jar.

          Show
          Rohini Palaniswamy added a comment - That's right. Thanks for taking a look Julien. My concern was whether you wanted to keep both versions of the jar in the backend too as my patch just ships one version to the backend in case of packages inside pig.jar.
          Alejandro Abdelnur made changes -
          Link This issue relates to MAPREDUCE-1700 [ MAPREDUCE-1700 ]
          Hide
          Alejandro Abdelnur added a comment -

          FYI, MAPREDUCE-1700 is introducing a classloader for MR jobs.

          Show
          Alejandro Abdelnur added a comment - FYI, MAPREDUCE-1700 is introducing a classloader for MR jobs.
          Hide
          Julien Le Dem added a comment -

          It looks like we could borrow: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ApplicationClassLoader.java
          But as this as to work with hadoop 20 we would have to duplicate this class or have a similar one.

          Show
          Julien Le Dem added a comment - It looks like we could borrow: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ApplicationClassLoader.java But as this as to work with hadoop 20 we would have to duplicate this class or have a similar one.

            People

            • Assignee:
              Unassigned
              Reporter:
              Julien Le Dem
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development