[HUDI-6120] fetchAllLogsMergedFileSlice will read basefile which it does not expect - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.14.0, 1.0.0-beta1
Component/s: None
Labels:
- pull-request-available

Description

Check the code snippet of org.apache.hudi.common.table.view.AbstractTableFileSystemView#fetchAllLogsMergedFileSlice:

private Option<FileSlice> fetchAllLogsMergedFileSlice(HoodieFileGroup fileGroup, String maxInstantTime) {
  List<FileSlice> fileSlices = fileGroup.getAllFileSlicesBeforeOn(maxInstantTime).collect(Collectors.toList());
  if (fileSlices.size() == 0) {
    return Option.empty();
  }
  if (fileSlices.size() == 1) {
    return Option.of(fileSlices.get(0));
  }
  final FileSlice latestSlice = fileSlices.get(0);
  FileSlice merged = new FileSlice(latestSlice.getPartitionPath(), latestSlice.getBaseInstantTime(),
      latestSlice.getFileId());

  // add log files from the latest slice to the earliest
  fileSlices.forEach(slice -> slice.getLogFiles().forEach(merged::addLogFile));
  return Option.of(merged);
}

if we only fetch one file slice, we will return the file slice with basefile, and then hudi-flink will create a SkipMergeIterator/MergeIterator which both reads basefile and logfiles for the split.

Attachments

Issue Links

links to

GitHub Pull Request #8529

Activity

People

Assignee:: Unassigned

Reporter:: Jianhui Dong

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/Apr/23 03:59

Updated:: 07/May/23 05:26

Resolved:: 07/May/23 05:26