Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4059 Pig on Spark
  3. PIG-4693

Class conflicts: Kryo bundled in spark vs kryo bundled with pig

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: spark-branch
    • Fix Version/s: spark-branch
    • Component/s: spark
    • Labels:
    1. PIG-4693.patch
      0.4 kB
      Srikanth Sundarrajan

      Issue Links

        Activity

        Hide
        sriksun Srikanth Sundarrajan added a comment -

        Running the following simple pig script

        IN = LOAD 'test-data' USING PigStorage('');
        G = GROUP IN BY $11;
        R = FOREACH G GENERATE group, SUM(IN.$10);
        STORE R INTO 'test-out' USING PigStorage(',');
        

        results in

        ERROR 2998: Unhandled internal error. com.esotericsoftware.kryo.Kryo.setInstantiatorStrategy(Lorg/objenesis/strategy/InstantiatorStrategy;)V
        
        java.lang.NoSuchMethodError: com.esotericsoftware.kryo.Kryo.setInstantiatorStrategy(Lorg/objenesis/strategy/InstantiatorStrategy;)V
                at com.twitter.chill.KryoBase.setInstantiatorStrategy(KryoBase.scala:86)
                at com.twitter.chill.EmptyScalaKryoInstantiator.newKryo(ScalaKryoInstantiator.scala:59)
                at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:80)
                at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:227)
                at org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:212)
                at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:128)
                at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:201)
                at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102)
                at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:85)
                at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
                at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
                at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1291)
                at org.apache.spark.rdd.NewHadoopRDD.<init>(NewHadoopRDD.scala:77)
                at org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1099)
                at org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1094)
                at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
                at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
                at org.apache.spark.SparkContext.withScope(SparkContext.scala:681)
                at org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:1094)
                at org.apache.pig.backend.hadoop.executionengine.spark.converter.LoadConverter.convert(LoadConverter.java:88)
                at org.apache.pig.backend.hadoop.executionengine.spark.converter.LoadConverter.convert(LoadConverter.java:58)
                at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:636)
                at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:603)
                at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:603)
                at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:603)
                at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:603)
                at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:603)
                at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.sparkOperToRDD(SparkLauncher.java:555)
                at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.sparkPlanToRDD(SparkLauncher.java:504)
                at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:206)
                at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:301)
                at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
                at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375)
                at org.apache.pig.PigServer.execute(PigServer.java:1364)
                at org.apache.pig.PigServer.executeBatch(PigServer.java:415)
                at org.apache.pig.PigServer.executeBatch(PigServer.java:398)
                at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:171)
                at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234)
                at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
                at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
                at org.apache.pig.Main.run(Main.java:624)
                at org.apache.pig.Main.main(Main.java:170)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.lang.reflect.Method.invoke(Method.java:606)
                at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
                at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
        ================================================================================
        

        Here is the logs from the class-loader:

        [Loaded org.apache.spark.broadcast.TorrentBroadcast$$anonfun$5 from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar]
        [Loaded org.apache.spark.serializer.KryoSerializationStream from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar]
        [Loaded org.apache.spark.serializer.KryoDeserializationStream from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar]
        [Loaded com.twitter.chill.KryoInstantiator from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar]
        [Loaded com.twitter.chill.EmptyScalaKryoInstantiator from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar]
        [Loaded com.twitter.chill.KryoInstantiator$3 from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar]
        [Loaded com.twitter.chill.KryoInstantiator$2 from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar]
        [Loaded com.twitter.chill.KryoInstantiator$1 from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar]
        [Loaded com.twitter.chill.KryoInstantiator$4 from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar]
        [Loaded org.objenesis.strategy.InstantiatorStrategy from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar]
        [Loaded com.esotericsoftware.kryo.KryoException from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        [Loaded com.esotericsoftware.kryo.ClassResolver from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        [Loaded com.esotericsoftware.kryo.StreamFactory from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        [Loaded com.esotericsoftware.kryo.factories.SerializerFactory from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$IntSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$BooleanSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$ByteSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$CharSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$ShortSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$DoubleSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$VoidSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        [Loaded com.esotericsoftware.kryo.ReferenceResolver from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        [Loaded com.esotericsoftware.shaded.org.objenesis.instantiator.ObjectInstantiator from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        [Loaded com.twitter.chill.ObjectSerializer from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar]
        [Loaded com.esotericsoftware.kryo.serializers.FieldSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        /hdp/2.2.0.0-2041/hadoop-hdfs/hadoop-hdfs-2.6.0.2.2.0.0-2041.jar]
        [Loaded com.esotericsoftware.kryo.util.DefaultClassResolver from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        

        org.objenesis.strategy.InstantiatorStrategy is used to set the InstantiatorStrategy in Kryo, where Kryo in 2.22 requires shaded com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy.class and hence the error.

        I guess Kryo is pulled in as a dependency for OrcStorage.

        Show
        sriksun Srikanth Sundarrajan added a comment - Running the following simple pig script IN = LOAD 'test-data' USING PigStorage(''); G = GROUP IN BY $11; R = FOREACH G GENERATE group, SUM(IN.$10); STORE R INTO 'test-out' USING PigStorage(','); results in ERROR 2998: Unhandled internal error. com.esotericsoftware.kryo.Kryo.setInstantiatorStrategy(Lorg/objenesis/strategy/InstantiatorStrategy;)V java.lang.NoSuchMethodError: com.esotericsoftware.kryo.Kryo.setInstantiatorStrategy(Lorg/objenesis/strategy/InstantiatorStrategy;)V at com.twitter.chill.KryoBase.setInstantiatorStrategy(KryoBase.scala:86) at com.twitter.chill.EmptyScalaKryoInstantiator.newKryo(ScalaKryoInstantiator.scala:59) at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:80) at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:227) at org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:212) at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:128) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:201) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102) at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:85) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1291) at org.apache.spark.rdd.NewHadoopRDD.<init>(NewHadoopRDD.scala:77) at org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1099) at org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1094) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.SparkContext.withScope(SparkContext.scala:681) at org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:1094) at org.apache.pig.backend.hadoop.executionengine.spark.converter.LoadConverter.convert(LoadConverter.java:88) at org.apache.pig.backend.hadoop.executionengine.spark.converter.LoadConverter.convert(LoadConverter.java:58) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:636) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:603) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:603) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:603) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:603) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:603) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.sparkOperToRDD(SparkLauncher.java:555) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.sparkPlanToRDD(SparkLauncher.java:504) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:206) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:301) at org.apache.pig.PigServer.launchPlan(PigServer.java:1390) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375) at org.apache.pig.PigServer.execute(PigServer.java:1364) at org.apache.pig.PigServer.executeBatch(PigServer.java:415) at org.apache.pig.PigServer.executeBatch(PigServer.java:398) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:171) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) at org.apache.pig.Main.run(Main.java:624) at org.apache.pig.Main.main(Main.java:170) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) ================================================================================ Here is the logs from the class-loader: [Loaded org.apache.spark.broadcast.TorrentBroadcast$$anonfun$5 from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar] [Loaded org.apache.spark.serializer.KryoSerializationStream from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar] [Loaded org.apache.spark.serializer.KryoDeserializationStream from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar] [Loaded com.twitter.chill.KryoInstantiator from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar] [Loaded com.twitter.chill.EmptyScalaKryoInstantiator from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar] [Loaded com.twitter.chill.KryoInstantiator$3 from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar] [Loaded com.twitter.chill.KryoInstantiator$2 from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar] [Loaded com.twitter.chill.KryoInstantiator$1 from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar] [Loaded com.twitter.chill.KryoInstantiator$4 from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar] [Loaded org.objenesis.strategy.InstantiatorStrategy from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar] [Loaded com.esotericsoftware.kryo.KryoException from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] [Loaded com.esotericsoftware.kryo.ClassResolver from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] [Loaded com.esotericsoftware.kryo.StreamFactory from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] [Loaded com.esotericsoftware.kryo.factories.SerializerFactory from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$IntSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$BooleanSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$ByteSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$CharSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$ShortSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$DoubleSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] [Loaded com.esotericsoftware.kryo.serializers.DefaultSerializers$VoidSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] [Loaded com.esotericsoftware.kryo.ReferenceResolver from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] [Loaded com.esotericsoftware.shaded.org.objenesis.instantiator.ObjectInstantiator from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] [Loaded com.twitter.chill.ObjectSerializer from file:/data/d1/home/sriksun/pig/lib/spark/spark-assembly-1.4.1-hadoop2.2.0.jar] [Loaded com.esotericsoftware.kryo.serializers.FieldSerializer from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] /hdp/2.2.0.0-2041/hadoop-hdfs/hadoop-hdfs-2.6.0.2.2.0.0-2041.jar] [Loaded com.esotericsoftware.kryo.util.DefaultClassResolver from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar] org.objenesis.strategy.InstantiatorStrategy is used to set the InstantiatorStrategy in Kryo, where Kryo in 2.22 requires shaded com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy.class and hence the error. I guess Kryo is pulled in as a dependency for OrcStorage.
        Hide
        sriksun Srikanth Sundarrajan added a comment -

        Xuefu Zhang, Mohit Agarwal, Xianda Ke, Thoughts welcome.

        Show
        sriksun Srikanth Sundarrajan added a comment - Xuefu Zhang , Mohit Agarwal , Xianda Ke , Thoughts welcome.
        Hide
        xuefuz Xuefu Zhang added a comment -

        Instead of depending on spark-assembly.jar, could we only depends spark-core.jar? Apparently Spark gets the dependency from twitter dependency, which cannot be overridden anyhow. Say SPARK-10910.

        Show
        xuefuz Xuefu Zhang added a comment - Instead of depending on spark-assembly.jar, could we only depends spark-core.jar? Apparently Spark gets the dependency from twitter dependency, which cannot be overridden anyhow. Say SPARK-10910 .
        Hide
        sriksun Srikanth Sundarrajan added a comment -

        Xuefu Zhang, Depending on the spark-core would require you to ship a whole lot of ancillary dependencies along to distributed cache and I was trying to avoid that by using the spark assembly. The original fix for PIG-4667 is anyways adding spark-core and its dependencies individually while leaving out kryo & guava.

        Show
        sriksun Srikanth Sundarrajan added a comment - Xuefu Zhang , Depending on the spark-core would require you to ship a whole lot of ancillary dependencies along to distributed cache and I was trying to avoid that by using the spark assembly. The original fix for PIG-4667 is anyways adding spark-core and its dependencies individually while leaving out kryo & guava.
        Hide
        sriksun Srikanth Sundarrajan added a comment -

        Issue persists even with usage with spark-core.

        [Loaded org.apache.spark.broadcast.TorrentBroadcast$$anonfun$5 from file:/data/d1/home/sriksun/pig/lib/spark/spark-core_2.10-1.4.1.jar]
        [Loaded org.apache.spark.serializer.KryoSerializationStream from file:/data/d1/home/sriksun/pig/lib/spark/spark-core_2.10-1.4.1.jar]
        [Loaded org.apache.spark.serializer.KryoDeserializationStream from file:/data/d1/home/sriksun/pig/lib/spark/spark-core_2.10-1.4.1.jar]
        [Loaded com.twitter.chill.KryoInstantiator from file:/data/d1/home/sriksun/pig/lib/spark/chill-java-0.5.0.jar]
        [Loaded com.twitter.chill.EmptyScalaKryoInstantiator from file:/data/d1/home/sriksun/pig/lib/spark/chill_2.10-0.5.0.jar]
        [Loaded com.twitter.chill.KryoInstantiator$3 from file:/data/d1/home/sriksun/pig/lib/spark/chill-java-0.5.0.jar]
        [Loaded com.twitter.chill.KryoInstantiator$2 from file:/data/d1/home/sriksun/pig/lib/spark/chill-java-0.5.0.jar]
        [Loaded com.twitter.chill.KryoInstantiator$1 from file:/data/d1/home/sriksun/pig/lib/spark/chill-java-0.5.0.jar]
        [Loaded com.twitter.chill.KryoInstantiator$4 from file:/data/d1/home/sriksun/pig/lib/spark/chill-java-0.5.0.jar]
        [Loaded org.objenesis.strategy.InstantiatorStrategy from file:/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/mockito-all-1.8.5.jar]
        [Loaded com.esotericsoftware.kryo.KryoException from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        
        Show
        sriksun Srikanth Sundarrajan added a comment - Issue persists even with usage with spark-core. [Loaded org.apache.spark.broadcast.TorrentBroadcast$$anonfun$5 from file:/data/d1/home/sriksun/pig/lib/spark/spark-core_2.10-1.4.1.jar] [Loaded org.apache.spark.serializer.KryoSerializationStream from file:/data/d1/home/sriksun/pig/lib/spark/spark-core_2.10-1.4.1.jar] [Loaded org.apache.spark.serializer.KryoDeserializationStream from file:/data/d1/home/sriksun/pig/lib/spark/spark-core_2.10-1.4.1.jar] [Loaded com.twitter.chill.KryoInstantiator from file:/data/d1/home/sriksun/pig/lib/spark/chill-java-0.5.0.jar] [Loaded com.twitter.chill.EmptyScalaKryoInstantiator from file:/data/d1/home/sriksun/pig/lib/spark/chill_2.10-0.5.0.jar] [Loaded com.twitter.chill.KryoInstantiator$3 from file:/data/d1/home/sriksun/pig/lib/spark/chill-java-0.5.0.jar] [Loaded com.twitter.chill.KryoInstantiator$2 from file:/data/d1/home/sriksun/pig/lib/spark/chill-java-0.5.0.jar] [Loaded com.twitter.chill.KryoInstantiator$1 from file:/data/d1/home/sriksun/pig/lib/spark/chill-java-0.5.0.jar] [Loaded com.twitter.chill.KryoInstantiator$4 from file:/data/d1/home/sriksun/pig/lib/spark/chill-java-0.5.0.jar] [Loaded org.objenesis.strategy.InstantiatorStrategy from file:/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/mockito-all-1.8.5.jar] [Loaded com.esotericsoftware.kryo.KryoException from file:/data/d1/home/sriksun/pig/lib/kryo-2.22.jar]
        Hide
        xuefuz Xuefu Zhang added a comment -

        Srikanth Sundarrajan, I verified that spark-core.jar doesn't contain kryo. From the above class loading info, it seems that we are also loading chill-java.jar, which seems containing kryo. Do you know why?

        Show
        xuefuz Xuefu Zhang added a comment - Srikanth Sundarrajan , I verified that spark-core.jar doesn't contain kryo. From the above class loading info, it seems that we are also loading chill-java.jar, which seems containing kryo. Do you know why?
        Hide
        sriksun Srikanth Sundarrajan added a comment -

        Yes. After the AM is launched, client makes a request to the AM for newHadoopRDD corresponding to the Load Statement, which is shipped through the kryo serializer as I had set this as the default spark serializer. The stack trace in the original comment has more detail. Posting a section of it here for quick reference.

        java.lang.NoSuchMethodError: com.esotericsoftware.kryo.Kryo.setInstantiatorStrategy(Lorg/objenesis/strategy/InstantiatorStrategy;)V
                at com.twitter.chill.KryoBase.setInstantiatorStrategy(KryoBase.scala:86)
                at com.twitter.chill.EmptyScalaKryoInstantiator.newKryo(ScalaKryoInstantiator.scala:59)
                at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:80)
                at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:227)
                at org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:212)
                at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:128)
                at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:201)
                at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102)
                at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:85)
                at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
                at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
                at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1291)
                at org.apache.spark.rdd.NewHadoopRDD.<init>(NewHadoopRDD.scala:77)
                at org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1099)
                at org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1094)
                at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
                at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
                at org.apache.spark.SparkContext.withScope(SparkContext.scala:681)
                at org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:1094)
                at org.apache.pig.backend.hadoop.executionengine.spark.converter.LoadConverter.convert(LoadConverter.java:88)
        
        Show
        sriksun Srikanth Sundarrajan added a comment - Yes. After the AM is launched, client makes a request to the AM for newHadoopRDD corresponding to the Load Statement, which is shipped through the kryo serializer as I had set this as the default spark serializer. The stack trace in the original comment has more detail. Posting a section of it here for quick reference. java.lang.NoSuchMethodError: com.esotericsoftware.kryo.Kryo.setInstantiatorStrategy(Lorg/objenesis/strategy/InstantiatorStrategy;)V at com.twitter.chill.KryoBase.setInstantiatorStrategy(KryoBase.scala:86) at com.twitter.chill.EmptyScalaKryoInstantiator.newKryo(ScalaKryoInstantiator.scala:59) at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:80) at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:227) at org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:212) at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:128) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:201) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102) at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:85) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1291) at org.apache.spark.rdd.NewHadoopRDD.<init>(NewHadoopRDD.scala:77) at org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1099) at org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1094) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.SparkContext.withScope(SparkContext.scala:681) at org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:1094) at org.apache.pig.backend.hadoop.executionengine.spark.converter.LoadConverter.convert(LoadConverter.java:88)
        Hide
        xuefuz Xuefu Zhang added a comment -

        I was thinking if we exclude chill-java.jar and include our own kryo.jar whether the problem will go away. I was sure how chill-java.jar got into the picture. Maybe via transitive dependency?

        Show
        xuefuz Xuefu Zhang added a comment - I was thinking if we exclude chill-java.jar and include our own kryo.jar whether the problem will go away. I was sure how chill-java.jar got into the picture. Maybe via transitive dependency?
        Hide
        sriksun Srikanth Sundarrajan added a comment -

        That might not help. As this is a direct dependency for spark-core

        core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala

        ...
        import com.esotericsoftware.kryo.{Kryo, KryoException}
        import com.esotericsoftware.kryo.io.{Input => KryoInput, Output => KryoOutput}
        import com.esotericsoftware.kryo.serializers.{JavaSerializer => KryoJavaSerializer}
        import com.twitter.chill.{AllScalaRegistrar, EmptyScalaKryoInstantiator}
        import org.roaringbitmap.{ArrayContainer, BitmapContainer, RoaringArray, RoaringBitmap}
        ...
          def newKryo(): Kryo = {
            val instantiator = new EmptyScalaKryoInstantiator
            val kryo = instantiator.newKryo()
        ...
        

        /chill_2.10/0.5.0/com/twitter/chill/ScalaKryoInstantiator.scala

        package com.twitter.chill
        ...
        class EmptyScalaKryoInstantiator extends KryoInstantiator {
          override def newKryo = {
            val k = new KryoBase
            k.setRegistrationRequired(false)
            k.setInstantiatorStrategy(new org.objenesis.strategy.StdInstantiatorStrategy)
            k
          }
        }
        ...
        

        /chill_2.10/0.5.0/com/twitter/chill/KryoBase.scala

        ...
        package com.twitter.chill
        ...
        import org.objenesis.instantiator.ObjectInstantiator
        import org.objenesis.strategy.InstantiatorStrategy
        ...
        class KryoBase extends Kryo {
        ...
          override def setInstantiatorStrategy(st: InstantiatorStrategy) {
            super.setInstantiatorStrategy(st)
            strategy = Some(st)
          }
        ...
        
        Show
        sriksun Srikanth Sundarrajan added a comment - That might not help. As this is a direct dependency for spark-core core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ... import com.esotericsoftware.kryo.{Kryo, KryoException} import com.esotericsoftware.kryo.io.{Input => KryoInput, Output => KryoOutput} import com.esotericsoftware.kryo.serializers.{JavaSerializer => KryoJavaSerializer} import com.twitter.chill.{AllScalaRegistrar, EmptyScalaKryoInstantiator} import org.roaringbitmap.{ArrayContainer, BitmapContainer, RoaringArray, RoaringBitmap} ... def newKryo(): Kryo = { val instantiator = new EmptyScalaKryoInstantiator val kryo = instantiator.newKryo() ... /chill_2.10/0.5.0/com/twitter/chill/ScalaKryoInstantiator.scala package com.twitter.chill ... class EmptyScalaKryoInstantiator extends KryoInstantiator { override def newKryo = { val k = new KryoBase k.setRegistrationRequired( false ) k.setInstantiatorStrategy( new org.objenesis.strategy.StdInstantiatorStrategy) k } } ... /chill_2.10/0.5.0/com/twitter/chill/KryoBase.scala ... package com.twitter.chill ... import org.objenesis.instantiator.ObjectInstantiator import org.objenesis.strategy.InstantiatorStrategy ... class KryoBase extends Kryo { ... override def setInstantiatorStrategy(st: InstantiatorStrategy) { super .setInstantiatorStrategy(st) strategy = Some(st) } ...
        Hide
        xuefuz Xuefu Zhang added a comment -

        If does it help if the kryo.jar is loaded before spark-core (or spark-assembly.jar for that matter)?

        Hive also depends on kryo 2.22, yet Hive remote driver, which depends on spark-core, doesn't face the problem. I'm not sure how it works there.

        Marcelo Vanzin, could you shed some light here?

        Show
        xuefuz Xuefu Zhang added a comment - If does it help if the kryo.jar is loaded before spark-core (or spark-assembly.jar for that matter)? Hive also depends on kryo 2.22, yet Hive remote driver, which depends on spark-core, doesn't face the problem. I'm not sure how it works there. Marcelo Vanzin , could you shed some light here?
        Hide
        sriksun Srikanth Sundarrajan added a comment -

        Xuefu Zhang, From what I can tell, switching the loading order of kryo wont help. My suggestion would be to downgrade the kryo dependency in pig to be consistent. I can verify if Orc wells with this change. Will wait for inputs from Marcelo Vanzin before I attempt that.

        Show
        sriksun Srikanth Sundarrajan added a comment - Xuefu Zhang , From what I can tell, switching the loading order of kryo wont help. My suggestion would be to downgrade the kryo dependency in pig to be consistent. I can verify if Orc wells with this change. Will wait for inputs from Marcelo Vanzin before I attempt that.
        Hide
        vanzin Marcelo Vanzin added a comment -

        I vaguely remember something about this. I think the problem was that the chill library needs an older version of kryo, and at the time at least there was not a version of chill built against kryo 2.22.

        I'm also not sure how Hive gets around it; perhaps Spark jars come first in the classpath, so Hive is using the older kryo, and just happens not to trigger any code path that depends on the new Kryo. If Pig can get away with that, might be a way out. Or check if there's a new build of chill against the newer kryo, and see if that one works.

        Show
        vanzin Marcelo Vanzin added a comment - I vaguely remember something about this. I think the problem was that the chill library needs an older version of kryo, and at the time at least there was not a version of chill built against kryo 2.22. I'm also not sure how Hive gets around it; perhaps Spark jars come first in the classpath, so Hive is using the older kryo, and just happens not to trigger any code path that depends on the new Kryo. If Pig can get away with that, might be a way out. Or check if there's a new build of chill against the newer kryo, and see if that one works.
        Hide
        sriksun Srikanth Sundarrajan added a comment -

        Thanks Marcelo Vanzin. The trunk version of chill still refers to org.objenesis.strategy.InstantiatorStrategy, so I guess the option to use newer version of chill to overcome this isn't viable.

        Show
        sriksun Srikanth Sundarrajan added a comment - Thanks Marcelo Vanzin . The trunk version of chill still refers to org.objenesis.strategy.InstantiatorStrategy, so I guess the option to use newer version of chill to overcome this isn't viable.
        Hide
        sriksun Srikanth Sundarrajan added a comment -

        Verified that OrcStorage works well with kryo-2.21. Used the following simple example to verify.

        A = LOAD 'student.txt' USING PigStorage(',') as (name:chararray, age:int, gpa:double);
        store A into 'student.orc' using OrcStorage('');
        

        Uploaded a simple patch downgrading the kryo dependency

        Show
        sriksun Srikanth Sundarrajan added a comment - Verified that OrcStorage works well with kryo-2.21. Used the following simple example to verify. A = LOAD 'student.txt' USING PigStorage(',') as (name:chararray, age: int , gpa: double ); store A into 'student.orc' using OrcStorage(''); Uploaded a simple patch downgrading the kryo dependency
        Hide
        xuefuz Xuefu Zhang added a comment -

        +1 on the latest patch.

        Show
        xuefuz Xuefu Zhang added a comment - +1 on the latest patch.
        Hide
        xuefuz Xuefu Zhang added a comment -

        Committed to Spark branch. Thanks, Srikanth!

        Show
        xuefuz Xuefu Zhang added a comment - Committed to Spark branch. Thanks, Srikanth!

          People

          • Assignee:
            sriksun Srikanth Sundarrajan
            Reporter:
            sriksun Srikanth Sundarrajan
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development