Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1205

Serialization fail when log file is larger than 2GB

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      When scanning the log file, if the log file(or log file group) is larger than 2GB, serialization will fail because Hudi uses Integer to store size in byte for the log file. The maximum integer representing bytes is 2GB.

      Caused by: com.esotericsoftware.kryo.KryoException: Unable to find class: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload$$Lambda$45/62103784
      Serialization trace:
      orderingVal (org.apache.hudi.common.model.OverwriteWithLatestAvroPayload)
      data (org.apache.hudi.common.model.HoodieRecord)
      at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160)
      at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
      at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693)
      at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
      at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
      at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
      at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
      at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
      at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
      at org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:107)
      at org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:81)
      at org.apache.hudi.common.util.collection.DiskBasedMap.get(DiskBasedMap.java:217)
      at org.apache.hudi.common.util.collection.DiskBasedMap.get(DiskBasedMap.java:211)
      at org.apache.hudi.common.util.collection.DiskBasedMap.get(DiskBasedMap.java:207)
      at org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:168)
      at org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:55)
      at org.apache.hudi.HoodieMergeOnReadRDD$$anon$1.hasNext(HoodieMergeOnReadRDD.scala:128)
      at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(Unknown Source)
      at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
      at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
      at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:624)
      at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
      at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
      at org.apache.spark.scheduler.Task.run(Task.scala:121)
      at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407)
      at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      Caused by: java.lang.ClassNotFoundException: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload$$Lambda$45/62103784
      at java.lang.Class.forName0(Native Method)
      at java.lang.Class.forName(Class.java:348)
      at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
      ... 31 more

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              garyli1019 Yanjia Gary Li
              Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: