[HUDI-7111] Performance regression of spark job which written into simple bucket index table - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.14.1
Component/s: spark
Labels:
- pull-request-available

Description

After upgrade the version to 0.14.0, the performance of the Spark job, which is written into a simple bucket index table, is regressing.

The reason is in the PR#4480, the refactor of bucket index introduce two unnecessary stages in tag for simple bucket index.

    List<String> partitions = records.map(HoodieRecord::getPartitionPath).distinct().collectAsList();

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2023-11-16-23-41-32-729.png
16/Nov/23 15:41
2.08 MB
Jing Zhang

Issue Links

links to

GitHub Pull Request #10130

Activity

People

Assignee:: Unassigned

Reporter:: Jing Zhang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 16/Nov/23 15:46

Updated:: 21/Nov/23 01:56

Resolved:: 21/Nov/23 01:56