Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-4959

Serializing objects using Kryo fails to deserialize data back w/o prior registration

    XMLWordPrintableJSON

Details

    • 2

    Description

      Originally reported in:

      https://github.com/apache/hudi/issues/6621

       

      Kryo (used in SerializationUtils) by default allows class objects to be serialized w/o prior registration w/ Kryo: in that case Kryo will encode the first occurrence of the object of a particular class with full class-name, but subsequent occurrences will be using class-id associated with it (on the fly).

      This poses issues for durable serialization (when we persist such serialized layout) in this case we're trying to deserialize file that doesn't have the class-name encoded and since user is running a different Spark job to read there's no association preserved in-memory either.

      NOTE: We should be using custom serialization sequences for every object we serialize for durable persistence, and avoid using frameworks like Kryo for that.

       


      EDIT

      I'm taking back my hypothesis that the issue is in the class encoding, after writing a small test to validate the issue i confirmed that Kryo actually writes out full class-name for all classes registered implicitly (as it should).

      It seems that the problem is actually indeed in misalignment of the Avro versions as reported by @KnightChess: quick-checking i see that b/w Avro 1.8.2 and 1.10.2, Utf8 actually had one more field added:

        // 1.8.2 
        private byte[] bytes = EMPTY;
        private int length;
        private String string;
      
        // 1.10.2
        private byte[] bytes;
        private int hash;
        private int length;
        private String string; 

       
      {{ }}Provided that we're relying on Kryo to generate serializer for orderingVal that could be Utf8 (based on FieldSerializer) it would actually explain why it couldn't deserialize it back (since they will have different serializers).

      Attachments

        Issue Links

          Activity

            People

              alexey.kudinkin Alexey Kudinkin
              alexey.kudinkin Alexey Kudinkin
              Shiyan Xu, sivabalan narayanan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: