Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-3348

HoodieRealtimeFileSplit losing info when serialized/deserialized

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 0.11.0
    • None
    • 0.5

    Description

      https://github.com/apache/hudi/pull/3865 added new field deltaLogFiles

       

      But it didn't corresponding modify the readFields, write routines that Hive is using to ser/de splits, meaning that essentially this info is lost upon passing the data b/w executors.

      Attachments

        Issue Links

          Activity

            This is likely impacting HUDI-2762

            alexey.kudinkin Alexey Kudinkin added a comment - This is likely impacting HUDI-2762
            alexey.kudinkin Alexey Kudinkin added a comment - - edited

            Would also be great to add test to make sure that we don't regress on this next time. 

            Along the lines of 

            split = new Split(...)
            bytes = write(split)
            clone = read(bytes)
            assertEquals(split, clone)
            alexey.kudinkin Alexey Kudinkin added a comment - - edited Would also be great to add test to make sure that we don't regress on this next time.  Along the lines of  split = new Split(...) bytes = write(split) clone = read(bytes) assertEquals(split, clone)
            xushiyan Shiyan Xu added a comment -

            the actual fix were done in HUDI-3280. just the UT can be added.

            xushiyan Shiyan Xu added a comment - the actual fix were done in HUDI-3280 . just the UT can be added.

            People

              xushiyan Shiyan Xu
              alexey.kudinkin Alexey Kudinkin
              Alexey Kudinkin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: