Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
When scanning the log file, if the log file(or log file group) is larger than 2GB, serialization will fail because Hudi uses Integer to store size in byte for the log file. The maximum integer representing bytes is 2GB.
Caused by: com.esotericsoftware.kryo.KryoException: Unable to find class: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload$$Lambda$45/62103784
Serialization trace:
orderingVal (org.apache.hudi.common.model.OverwriteWithLatestAvroPayload)
data (org.apache.hudi.common.model.HoodieRecord)
at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160)
at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
at org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:107)
at org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:81)
at org.apache.hudi.common.util.collection.DiskBasedMap.get(DiskBasedMap.java:217)
at org.apache.hudi.common.util.collection.DiskBasedMap.get(DiskBasedMap.java:211)
at org.apache.hudi.common.util.collection.DiskBasedMap.get(DiskBasedMap.java:207)
at org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:168)
at org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:55)
at org.apache.hudi.HoodieMergeOnReadRDD$$anon$1.hasNext(HoodieMergeOnReadRDD.scala:128)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:624)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload$$Lambda$45/62103784
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
... 31 more
Attachments
Issue Links
- relates to
-
HUDI-1603 DefaultHoodieRecordPayload Serialization Failed
- Closed
- links to