Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1341

UnshadedKryoAdapter fails to deserialize StarGraph when SparkConf sets spark.rdd.compress=true whereas GryoSerializer works

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.2.1, 3.3.0
    • Fix Version/s: 3.2.1
    • Component/s: io
    • Labels:
      None

      Description

      When trying to bulk load a large dataset into Titan I was running into OOM errors and decided to try tweaking some spark configuration settings - although I am having trouble bulk loading with the new GryoRegistrator/UnshadedKryo serialization shim stuff in master whereby a few hundred tasks into the edge loading stage (stage 5) exceptions are thrown complaining about the need to explicitly register CompactBuffer[].class with Kryo, this approach with spark.rdd.compress=true fails a few hundred tasks into the vertex loading stage (stage 1) of BulkLoaderVertexProgram. GryoSerializer instead of KryoSerializer with GryoRegistrator does not fail and successfully loads the data with this compression flag flipped on whereas before I would just get OOM errors until eventually the job was set back so far that it just failed. So it would seem it is desirable in some instances to use this setting, and the new Serialization stuff seems to break it. Could be a Spark upstream issue based on this open JIRA ticket (https://issues.apache.org/jira/browse/SPARK-3630). Here is the exception that is thrown with the middle bits cut out:

      com.esotericsoftware.kryo.KryoException: java.io.IOException: PARSING_ERROR(2)
      at com.esotericsoftware.kryo.io.Input.fill(Input.java:142)
      at com.esotericsoftware.kryo.io.Input.require(Input.java:169)
      at com.esotericsoftware.kryo.io.Input.readLong_slow(Input.java:715)
      at com.esotericsoftware.kryo.io.Input.readLong(Input.java:665)
      at com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:113)
      at com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:103)
      at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
      at org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedKryoAdapter.readClassAndObject(UnshadedKryoAdapter.java:48)
      at org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedKryoAdapter.readClassAndObject(UnshadedKryoAdapter.java:30)
      at org.apache.tinkerpop.gremlin.structure.util.star.StarGraphSerializer.readEdges(StarGraphSerializer.java:134)
      at org.apache.tinkerpop.gremlin.structure.util.star.StarGraphSerializer.read(StarGraphSerializer.java:91)
      at org.apache.tinkerpop.gremlin.structure.util.star.StarGraphSerializer.read(StarGraphSerializer.java:45)
      at org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedSerializerAdapter.read(UnshadedSerializerAdapter.java:55)
      at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
      at org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedKryoAdapter.readObject(UnshadedKryoAdapter.java:42)
      at org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedKryoAdapter.readObject(UnshadedKryoAdapter.java:30)
      at org.apache.tinkerpop.gremlin.spark.structure.io.gryo.VertexWritableSerializer.read(VertexWritableSerializer.java:46)
      at org.apache.tinkerpop.gremlin.spark.structure.io.gryo.VertexWritableSerializer.read(VertexWritableSerializer.java:36)
      at org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedSerializerAdapter.read(UnshadedSerializerAdapter.java:55)
      at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
      at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:228)

      ........................................................ and so on .....................................

      Caused by: java.io.IOException: PARSING_ERROR(2)
      at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
      at org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
      at org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594)
      at org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:358)
      at org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:167)
      at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:150)
      at com.esotericsoftware.kryo.io.Input.fill(Input.java:140)
      ... 51 more

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dalaro Dan LaRocque
                Reporter:
                dylanht Dylan Bethune-Waddell
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: