Pig
  1. Pig
  2. PIG-2532

Registered classes fail deserialization in frontend

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0, 0.9.3, 0.11
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This issue came up while integrating HCatalog with our environment. HCatalog jars are added to the pig command-line with -Dpig.additional.jars but fails (exception below). When added to the pig classpath the error goes away.

      We identified the issue as deserialization using the root class loader, not the context class loader set when the thread is created. This causes HCatSchema which is serialized into the context to fail deserialization in the thread.

      2012-02-14 21:55:53,936 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 6017: java.io.IOException: Deserialization error: org.apache.hcatalog.data.schema.HCatSchema
      	at org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:55)
      	at org.apache.pig.impl.util.UDFContext.deserialize(UDFContext.java:181)
      	at org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil.setupUDFContext(MapRedUtil.java:159)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:229)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:186)
      	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:811)
      	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:771)
      	at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
      	at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigJobControl.mainLoopAction(PigJobControl.java:144)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigJobControl.run(PigJobControl.java:121)
      	at java.lang.Thread.run(Thread.java:662)
      Caused by: java.lang.ClassNotFoundException: org.apache.hcatalog.data.schema.HCatSchema
      	at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
      	at java.lang.Class.forName0(Native Method)
      	at java.lang.Class.forName(Class.java:247)
      	at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:603)
      	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1574)
      	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1495)
      	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1731)
      	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
      	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
      	at java.util.Hashtable.readObject(Hashtable.java:859)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
      	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
      	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
      	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
      	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
      	at java.util.HashMap.readObject(HashMap.java:1030)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
      	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
      	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
      	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
      	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
      	at org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
      	... 15 more
      
      1. PIG-2532-v4-branch-0.9.patch
        13 kB
        Julien Le Dem
      2. PIG-2532-v4.patch
        13 kB
        Julien Le Dem
      3. PIG-2532-v3.patch
        14 kB
        Julien Le Dem
      4. PIG-2532-v2.patch
        14 kB
        Travis Crawford
      5. PIG-2532-log.zip
        28 kB
        Thomas Weise
      6. PIG-2532-h23.patch
        2 kB
        Daniel Dai
      7. PIG-2532.patch
        2 kB
        Travis Crawford
      8. PIG-253_javax.zip
        17 kB
        Travis Crawford

        Issue Links

          Activity

          Hide
          Dmitriy V. Ryaboy added a comment -

          Nice find, guys.
          We probably need an e2e test for this. Looks like they already create a separate udfs jar, so it should be possible to reproduce the scenario of not having the jar on the classpath, but only registering it.

          Show
          Dmitriy V. Ryaboy added a comment - Nice find, guys. We probably need an e2e test for this. Looks like they already create a separate udfs jar, so it should be possible to reproduce the scenario of not having the jar on the classpath, but only registering it.
          Hide
          Travis Crawford added a comment -

          Status update:

          I reproduced the issue in a unit test, and verified the patch fixes the issue. The issue presents itself when a jar is registered at runtime, and a class in that registered jar is stored in the UDFContext. Without the patch deserilization fails.

          To setup the test I store two java files as resources, and compile+jar them at runtime (using javax classes). I took this approach to keep the test setup isolated.

          However, since it uses javax classes this approach may not be ideal. I'm preparing another version that performs these setup steps in ant (generate the test-specific jar, runs the test with an appropriate classpath. That version will likely be the better choice to commit.

          Show
          Travis Crawford added a comment - Status update: I reproduced the issue in a unit test, and verified the patch fixes the issue. The issue presents itself when a jar is registered at runtime, and a class in that registered jar is stored in the UDFContext. Without the patch deserilization fails. To setup the test I store two java files as resources, and compile+jar them at runtime (using javax classes). I took this approach to keep the test setup isolated. However, since it uses javax classes this approach may not be ideal. I'm preparing another version that performs these setup steps in ant (generate the test-specific jar, runs the test with an appropriate classpath. That version will likely be the better choice to commit.
          Hide
          Travis Crawford added a comment -

          Uploading zip file of version that compiles+jars at runtime, mainly so it exists somewhere other than my laptop

          Show
          Travis Crawford added a comment - Uploading zip file of version that compiles+jars at runtime, mainly so it exists somewhere other than my laptop
          Hide
          Travis Crawford added a comment -

          Internally we discussed and think this approach is best because it keeps the test relatively self-contained (instead of majorly leaking into the build file). Additionally, we checked OpenJDK and confirmed these classes are present.

          Show
          Travis Crawford added a comment - Internally we discussed and think this approach is best because it keeps the test relatively self-contained (instead of majorly leaking into the build file). Additionally, we checked OpenJDK and confirmed these classes are present.
          Hide
          Julien Le Dem added a comment -

          +1

          Show
          Julien Le Dem added a comment - +1
          Hide
          Daniel Dai added a comment -

          +1. Nice catch Travis! This does bother us for a long time.

          Show
          Daniel Dai added a comment - +1. Nice catch Travis! This does bother us for a long time.
          Hide
          Julien Le Dem added a comment -

          I rebased the patch (v3)
          will commit

          Show
          Julien Le Dem added a comment - I rebased the patch (v3) will commit
          Hide
          Julien Le Dem added a comment -

          I rebased the patch again and did a small change so that the test can be run from eclipse

          Show
          Julien Le Dem added a comment - I rebased the patch again and did a small change so that the test can be run from eclipse
          Hide
          Ashutosh Chauhan added a comment -

          Awesome. Thanks, Travis for this. Can I request this to be back-ported to 0.8 ?

          Show
          Ashutosh Chauhan added a comment - Awesome. Thanks, Travis for this. Can I request this to be back-ported to 0.8 ?
          Hide
          Ashutosh Chauhan added a comment -

          Sorry, I meant 0.9 branch.

          Show
          Ashutosh Chauhan added a comment - Sorry, I meant 0.9 branch.
          Hide
          Daniel Dai added a comment -

          +1 for backport. Some resync is needed though.

          Show
          Daniel Dai added a comment - +1 for backport. Some resync is needed though.
          Hide
          Julien Le Dem added a comment -

          I checked-in PIG-2532-v4.patch in TRUNK.
          I will look into back porting to 0.9

          Show
          Julien Le Dem added a comment - I checked-in PIG-2532 -v4.patch in TRUNK. I will look into back porting to 0.9
          Hide
          Julien Le Dem added a comment -

          Adding PIG-2532-v4-branch-0.9.patch
          The only difference is regarding the eclipse-files generation.
          Otherwise the patch applies cleanly

          Show
          Julien Le Dem added a comment - Adding PIG-2532 -v4-branch-0.9.patch The only difference is regarding the eclipse-files generation. Otherwise the patch applies cleanly
          Hide
          Julien Le Dem added a comment -

          checked in in branch-0.9

          Show
          Julien Le Dem added a comment - checked in in branch-0.9
          Hide
          Thomas Weise added a comment -

          After this change, following test fails on 0.23:

          ant -Dhadoopversion=23 clean test -Dtestcase=TestRegisteredJarVisibility
          
          
              [junit] Tests run: 2, Failures: 0, Errors: 2, Time elapsed: 113.172 sec
              [junit] Test org.apache.pig.test.TestRegisteredJarVisibility FAILED
          

          Will this go into 0.10 also?

          Show
          Thomas Weise added a comment - After this change, following test fails on 0.23: ant -Dhadoopversion=23 clean test -Dtestcase=TestRegisteredJarVisibility [junit] Tests run: 2, Failures: 0, Errors: 2, Time elapsed: 113.172 sec [junit] Test org.apache.pig.test.TestRegisteredJarVisibility FAILED Will this go into 0.10 also?
          Hide
          Travis Crawford added a comment -

          Thomas, thanks for the report. I will take a look at this test failure.

          Show
          Travis Crawford added a comment - Thomas, thanks for the report. I will take a look at this test failure.
          Hide
          Julien Le Dem added a comment -

          I've been looking at this problem and does not seem related to the patch. Thomas, could you send the content of the related junit log file? I see a bunch of Guice related exceptions.

          Show
          Julien Le Dem added a comment - I've been looking at this problem and does not seem related to the patch. Thomas, could you send the content of the related junit log file? I see a bunch of Guice related exceptions.
          Hide
          Thomas Weise added a comment -

          Attaching test log.

          Show
          Thomas Weise added a comment - Attaching test log.
          Hide
          Daniel Dai added a comment -

          PIG-2532-h23.patch fix h23 test failure. Also I find the patch is not committed to 0.10 branch, I committed it to 0.10 branch as well.

          Show
          Daniel Dai added a comment - PIG-2532 -h23.patch fix h23 test failure. Also I find the patch is not committed to 0.10 branch, I committed it to 0.10 branch as well.
          Hide
          Aniket Mokashi added a comment -

          I tested on trunk, its not fully fixed. If we register the jar from s3, it fails with the same error. I will open another jira for the same.

          Show
          Aniket Mokashi added a comment - I tested on trunk, its not fully fixed. If we register the jar from s3, it fails with the same error. I will open another jira for the same.

            People

            • Assignee:
              Travis Crawford
              Reporter:
              Travis Crawford
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development