Pig
  1. Pig
  2. PIG-2815

class loader management in PigContext

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.11
    • Component/s: impl
    • Labels:
      None

      Description

      The way PigContext.classloader and resolveClassName() are managed can lead to strange class loading issues, especially when not all register statements are at the top (example in the first comment).

      Two factors contribute to this: sometimes only one of them and sometimes together:

      1. a new classloader (CL) is created after registering each jar.
        • but the new jar's parent is the root CL rather than previous CL, effectively throwing previous CL away.
      2. resolveClassName() caches classes based on just the name
        • A class is not defined by name alone. Classes loaded by two different unrelated CLs are different objects even if both extract the class from same physical jar file.
        • because of (1), the cached class is not necessarily same as the class that would be loaded based on 'current' CL

      having different class objects for same class have many subtle side effects. e.g. there would be two instances of static variables.

      I think both should be fixed.. thought fixing one of them might be good enough in many cases. I will add a patch.

      1. PIG-2815.patch
        6 kB
        Raghu Angadi
      2. PIG-2815.patch
        3 kB
        Raghu Angadi
      3. PIG-2815-branch-0.9.patch
        7 kB
        Raghu Angadi
      4. PIG-2815-branch-0.9.patch
        7 kB
        Raghu Angadi

        Activity

        Raghu Angadi created issue -
        Hide
        Raghu Angadi added a comment -

        An example:

        register elephant-bird.jar; -- for working with Thrift objects.
        -- (1)
        
        register T_One.jar;
        -- (2)
        
        -- ThriftPigLoader takes name of a Thrift class that corresponds to input.
        a = load '/logs/T_One' using ThriftPigLoader('thrift.gen.T_One');
        -- (3)
        
        register second.jar; 
        -- (4)
        
        b = load '/logs/T_two' using ThriftPigLoader('thrift.gen.T_two');
        
        -- (5)
        -- FAIL!
        
        • (1): new classlaoder cl_A is created with root classloader as the parent.
        • (2): cl_B is created with root as the parent.
        • (3): ThirftPigLoader.class is instantiated with cl_B and cached.
        • (4): cl_C is created with root as the parent.
        • (5): thrift.gen.T_two.class is instantiated with cl_C, but 'ThriftPigLoader.class' from cl_B is reused by Pig. So all the Thrift classes seen by ThriftPigLoader are entirely different from all the Thrift classes seen by thrift.gen.T_two since cl_B is not a parent of cl_C. That can lead to a number of issues and it does.
        Show
        Raghu Angadi added a comment - An example: register elephant-bird.jar; -- for working with Thrift objects. -- (1) register T_One.jar; -- (2) -- ThriftPigLoader takes name of a Thrift class that corresponds to input. a = load '/logs/T_One' using ThriftPigLoader('thrift.gen.T_One'); -- (3) register second.jar; -- (4) b = load '/logs/T_two' using ThriftPigLoader('thrift.gen.T_two'); -- (5) -- FAIL! (1): new classlaoder cl_A is created with root classloader as the parent. (2): cl_B is created with root as the parent. (3): ThirftPigLoader.class is instantiated with cl_B and cached. (4): cl_C is created with root as the parent. (5): thrift.gen.T_two.class is instantiated with cl_C, but ' ThriftPigLoader.class ' from cl_B is reused by Pig. So all the Thrift classes seen by ThriftPigLoader are entirely different from all the Thrift classes seen by thrift.gen.T_two since cl_B is not a parent of cl_C. That can lead to a number of issues and it does.
        Hide
        Raghu Angadi added a comment -

        changes in the patch :

        1. remove 'classCache':
        • may be resolveClassName() could cache the 'prefix' that successfully loaded a class to avoid checking other prefixes. Reloading an existing class is not costly since JVM has a cache anyway.
        • I don't think this is a perf issue either way.
        1. use a classloader that supports adding jars.
        • not sure if it is common practice to add resources.. or if there are any side effects.
        Show
        Raghu Angadi added a comment - changes in the patch : remove 'classCache': may be resolveClassName() could cache the 'prefix' that successfully loaded a class to avoid checking other prefixes. Reloading an existing class is not costly since JVM has a cache anyway. I don't think this is a perf issue either way. use a classloader that supports adding jars. not sure if it is common practice to add resources.. or if there are any side effects.
        Raghu Angadi made changes -
        Field Original Value New Value
        Attachment PIG-2815.patch [ 12536343 ]
        Hide
        Jonathan Coveney added a comment -

        Raghu,

        This is awesome. Is there any hope that you could add a test that fails in the current setup? I think that would be important as far as conveying a concrete use case when this fails (which your example does, but if we have a test that fails it'll be super super concrete.

        Thanks for doing this. Awesome.

        Jon

        Show
        Jonathan Coveney added a comment - Raghu, This is awesome. Is there any hope that you could add a test that fails in the current setup? I think that would be important as far as conveying a concrete use case when this fails (which your example does, but if we have a test that fails it'll be super super concrete. Thanks for doing this. Awesome. Jon
        Hide
        Raghu Angadi added a comment -

        unit test : update TestRegisteredJarVisibility to test for classloader 'consistency'.

        Show
        Raghu Angadi added a comment - unit test : update TestRegisteredJarVisibility to test for classloader 'consistency'.
        Raghu Angadi made changes -
        Attachment PIG-2815.patch [ 12536600 ]
        Hide
        Ashutosh Chauhan added a comment -

        Raghu Angadi Thanks for digging into it. Till now atleast in some cases if I registered the jar and then if I want to use it in Pig client frontend, I also necessarily need to add it in PIG_CLASSPATH. Does this patch removes that restriction? Put another way, will registering the jar will automatically add the jar in pig's classpath?

        Show
        Ashutosh Chauhan added a comment - Raghu Angadi Thanks for digging into it. Till now atleast in some cases if I registered the jar and then if I want to use it in Pig client frontend, I also necessarily need to add it in PIG_CLASSPATH . Does this patch removes that restriction? Put another way, will registering the jar will automatically add the jar in pig's classpath?
        Ashutosh Chauhan made changes -
        Assignee Raghu Angadi [ rangadi ]
        Hide
        Raghu Angadi added a comment -

        Ashutosh, that issue is fixed in PIG-2532. Let us know otherwise.

        Still there are some corner cases: e.g. if you use ObjectSerializer on frontend, that class should be in CLASSPATH (the fix is to extend ObjectSerializer to take a classloader or it should use current threads classloader).

        Show
        Raghu Angadi added a comment - Ashutosh, that issue is fixed in PIG-2532 . Let us know otherwise. Still there are some corner cases: e.g. if you use ObjectSerializer on frontend, that class should be in CLASSPATH (the fix is to extend ObjectSerializer to take a classloader or it should use current threads classloader).
        Hide
        Jonathan Coveney added a comment -

        +1, will commit once test-commit passes

        Show
        Jonathan Coveney added a comment - +1, will commit once test-commit passes
        Hide
        Ashutosh Chauhan added a comment -

        Raghu Angadi I think I have hit on corner cases. Will file jira for it. Do you happen to know any other places where it will manifest ?

        Show
        Ashutosh Chauhan added a comment - Raghu Angadi I think I have hit on corner cases. Will file jira for it. Do you happen to know any other places where it will manifest ?
        Hide
        Jonathan Coveney added a comment -

        Committed. Thanks Raghu! Also: want to make a ticket for that ObjectSerializer CLASSPATH issue you mentioned?

        Show
        Jonathan Coveney added a comment - Committed. Thanks Raghu! Also: want to make a ticket for that ObjectSerializer CLASSPATH issue you mentioned?
        Hide
        Raghu Angadi added a comment -

        Thanks for the quick review Jon. filed PIG-2819 for ObjectSerializer.

        Ashutosh, new jira will be useful. not sure of specific corner cases, but I am sure they are there .

        Show
        Raghu Angadi added a comment - Thanks for the quick review Jon. filed PIG-2819 for ObjectSerializer. Ashutosh, new jira will be useful. not sure of specific corner cases, but I am sure they are there .
        Hide
        Raghu Angadi added a comment -

        patch for 0.9.

        Show
        Raghu Angadi added a comment - patch for 0.9.
        Raghu Angadi made changes -
        Attachment PIG-2815-branch-0.9.patch [ 12536893 ]
        Hide
        Jonathan Coveney added a comment -

        Raghu,

        Thanks for this. Note: in your patch for pig-0.9, you removed a default PigContext constructor. Not sure if you meant to do that on purpose.

        public PigContext() {
           this(ExecType.MAPREDUCE, new Properties());
        }
        
        Show
        Jonathan Coveney added a comment - Raghu, Thanks for this. Note: in your patch for pig-0.9, you removed a default PigContext constructor. Not sure if you meant to do that on purpose. public PigContext() { this (ExecType.MAPREDUCE, new Properties()); }
        Hide
        Raghu Angadi added a comment -

        Thanks Jon.

        updated the patch. There was another small spurious change.

        Show
        Raghu Angadi added a comment - Thanks Jon. updated the patch. There was another small spurious change.
        Raghu Angadi made changes -
        Attachment PIG-2815-branch-0.9.patch [ 12537029 ]
        Hide
        Julien Le Dem added a comment -

        It sounds like this has been committed. Can you resolve it?

        Show
        Julien Le Dem added a comment - It sounds like this has been committed. Can you resolve it?
        Hide
        Cheolsoo Park added a comment -

        It seems that the patch is committed to 0.11/trunk but not to 0.9/0.10.

        Please let me know if anyone wants this to be committed to 0.9/0.10. Or I will close the jira.

        Thanks!

        Show
        Cheolsoo Park added a comment - It seems that the patch is committed to 0.11/trunk but not to 0.9/0.10. Please let me know if anyone wants this to be committed to 0.9/0.10. Or I will close the jira. Thanks!
        Hide
        Julien Le Dem added a comment -

        0.11/trunk sounds good to me

        Show
        Julien Le Dem added a comment - 0.11/trunk sounds good to me
        Hide
        Cheolsoo Park added a comment -

        Closed.

        Show
        Cheolsoo Park added a comment - Closed.
        Cheolsoo Park made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Bill Graham made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        146d 16h 49m 1 Cheolsoo Park 06/Dec/12 23:31
        Resolved Resolved Closed Closed
        77d 5h 22m 1 Bill Graham 22/Feb/13 04:53

          People

          • Assignee:
            Raghu Angadi
            Reporter:
            Raghu Angadi
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development