Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5392

Fix Bootstrap files reader to configure arrays to be read in the new format

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      When writing Bootstrap file we’re using Spark writer that writes arrays in the new format, while Hudi reads it in the old (Avro compatible) format:

       // Old
       optional group tip_history (LIST) {
          repeated group array {
            optional double amount;
            optional binary currency (UTF8);
          }
        }
      
       // new
       optional group tip_history (LIST) {
          repeated group list {
            optional group element {
              optional double amount;
              optional binary currency (UTF8);
            }
          }
        } 

       

      To fix that we need to make sure that Bootstrap files are always read in a new format (Spark default) unlike Hudi's Parquet files

      We also need to fix TestDataSourceForBootstrap, as it currently doesn't actually assert that the records are written correctly.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            alexey.kudinkin Alexey Kudinkin
            alexey.kudinkin Alexey Kudinkin
            Shiyan Xu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Agile

                Completed Sprints:
                2022/12/12 ended 20/Dec/22
                0.13.0 Final Sprint ended 10/Jan/23
                0.13.0 Final Sprint 2 ended 18/Jan/23
                0.13.0 Final Sprint 3 ended 01/Feb/23
                View on Board

                Slack

                  Issue deployment