Pig
  1. Pig
  2. PIG-3722

Udf deserialization for registered classes fails in local_mode

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      Similar to https://issues.apache.org/jira/browse/PIG-2532, registered classes are not available if jobs are converted to local_mode.

      1. PIG-3722.patch
        4 kB
        Aniket Mokashi

        Issue Links

          Activity

          Hide
          Aniket Mokashi added a comment -

          Committed to trunk. Thanks Dmitriy V. Ryaboy for the review!

          Show
          Aniket Mokashi added a comment - Committed to trunk. Thanks Dmitriy V. Ryaboy for the review!
          Hide
          Dmitriy V. Ryaboy added a comment -

          +1

          Show
          Dmitriy V. Ryaboy added a comment - +1
          Hide
          Aniket Mokashi added a comment -

          I tested this on a production job, it works well.

          Show
          Aniket Mokashi added a comment - I tested this on a production job, it works well.
          Hide
          Aniket Mokashi added a comment -

          This happens because ObjectInputStream doesn't take into consideration Thread ContextClassLoader when deserializing hence we get following stack trace in local-mode backend-

          2014-01-24 08:30:33,260 WARN org.apache.hadoop.mapred.LocalJobRunner: job_local_0002
          java.io.IOException: Deserialization error: org.apache.hcatalog.data.schema.HCatSchema
           at org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:59)
           at org.apache.pig.impl.util.UDFContext.deserialize(UDFContext.java:192)
           at org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil.setupUDFContext(MapRedUtil.java:173)
           at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:229)
           at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:275)
           at org.apache.hadoop.mapred.Task.initialize(Task.java:511)
           at org.apache.hadoop.mapred.MapTask.run(MapTask.java:306)
           at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
          

          To fix this, we can use ClassLoaderObjectInputStream.

          Show
          Aniket Mokashi added a comment - This happens because ObjectInputStream doesn't take into consideration Thread ContextClassLoader when deserializing hence we get following stack trace in local-mode backend- 2014-01-24 08:30:33,260 WARN org.apache.hadoop.mapred.LocalJobRunner: job_local_0002 java.io.IOException: Deserialization error: org.apache.hcatalog.data.schema.HCatSchema at org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:59) at org.apache.pig.impl.util.UDFContext.deserialize(UDFContext.java:192) at org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil.setupUDFContext(MapRedUtil.java:173) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:229) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:275) at org.apache.hadoop.mapred.Task.initialize(Task.java:511) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:306) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210) To fix this, we can use ClassLoaderObjectInputStream .

            People

            • Assignee:
              Aniket Mokashi
              Reporter:
              Aniket Mokashi
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development