Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-727

Copy default values of fields if not present when rewriting incoming record with new schema

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Currently we recommend users to evolve schema in backwards compatible way. When one is trying to evolve schema in backwards compatible way, one of the most significant things to do is to define default value for newly added columns so that records published with previous schema also can be consumed properly. 
       
      However just before actually writing record to Hudi dataset, we try to rewrite record with new Avro schema which has Hudi metadata columns [1]. In this function, we are only trying to get the values from record without considering field's default value. As a result, schema validation fails. 
      IMO, this piece of code should take into account default value as well in case field's actual value is null. 
       
      [1] https://github.com/apache/incubator-hudi/blob/078d4825d909b2c469398f31c97d2290687321a8/hudi-common/src/main/java/org/apache/hudi/common/util/HoodieAvroUtils.java#L205.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Pratyaksh Pratyaksh Sharma
            Pratyaksh Pratyaksh Sharma
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m

                Slack

                  Issue deployment