Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-898

Need to add Schema parameter to HoodieRecordPayload::preCombine

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Common Core

    Description

      We are working on Mongo Oplog integration with Hudi, to stream Mongo updates to Hudi tables.

      There are 4 Mongo OpLog operations we need to handle, CRUD (create, read, update, delete).

      Currently Hudi handle create/read, delete, but not update well with existing preCombine API in HoodieRecordPayload class. In particularly, Update operation contains "patch" field, which is extended Json describing update for dot separated field paths.

      We need to pass Avro schema to preCombine API for it to work:

      Even though BaseAvroPayload constructor accepts GenericRecord, which has Avro schema reference, but it materialize GenericRecord to bytes, to support serialization/deserialization by ExternalSpillableMap.

       

      Is there concern/objection to this? in other words, have I overlooked something?

       

      Attachments

        Issue Links

          Activity

            People

              vbalaji Balaji Varadarajan
              yx3zhu@gmail.com Yixue Zhu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: