Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5392

Fix Bootstrap files reader to configure arrays to be read in the new format

    XMLWordPrintableJSON

Details

    Description

      When writing Bootstrap file we’re using Spark writer that writes arrays in the new format, while Hudi reads it in the old (Avro compatible) format:

       // Old
       optional group tip_history (LIST) {
          repeated group array {
            optional double amount;
            optional binary currency (UTF8);
          }
        }
      
       // new
       optional group tip_history (LIST) {
          repeated group list {
            optional group element {
              optional double amount;
              optional binary currency (UTF8);
            }
          }
        } 

       

      To fix that we need to make sure that Bootstrap files are always read in a new format (Spark default) unlike Hudi's Parquet files

      We also need to fix TestDataSourceForBootstrap, as it currently doesn't actually assert that the records are written correctly.

      Attachments

        Issue Links

          Activity

            People

              alexey.kudinkin Alexey Kudinkin
              alexey.kudinkin Alexey Kudinkin
              Raymond Xu
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: