[HUDI-5392] Fix Bootstrap files reader to configure arrays to be read in the new format - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.13.0
Component/s: bootstrap
Labels:
- pull-request-available

Story Points:
1
Epic Link:
Improve Bootstrap

Description

When writing Bootstrap file we’re using Spark writer that writes arrays in the new format, while Hudi reads it in the old (Avro compatible) format:

 // Old
 optional group tip_history (LIST) {
    repeated group array {
      optional double amount;
      optional binary currency (UTF8);
    }
  }

 // new
 optional group tip_history (LIST) {
    repeated group list {
      optional group element {
        optional double amount;
        optional binary currency (UTF8);
      }
    }
  }

To fix that we need to make sure that Bootstrap files are always read in a new format (Spark default) unlike Hudi's Parquet files

We also need to fix TestDataSourceForBootstrap, as it currently doesn't actually assert that the records are written correctly.

Attachments

Issue Links

links to

GitHub Pull Request #7461

Activity

People

Assignee:: Alexey Kudinkin

Reporter:: Alexey Kudinkin

Reviewers:: Raymond Xu

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 14/Dec/22 22:37

Updated:: 24/Jan/23 20:42

Resolved:: 24/Jan/23 20:42