Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
-
None
-
2
Description
While debugging a flaky test, realized that we are generating more than 1 split for one log file itself. Root caused it to isSpllitable() that returns true for HoodieRealTimePath.
I made a quick fix locally and verified that only one split is generated per log file.
git diff hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java diff --git a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java index bba44d5c66..d09dfdf753 100644 --- a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java +++ b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java @@ -89,7 +89,7 @@ public class HoodieRealtimePath extends Path { } public boolean isSplitable() { - return !toString().isEmpty() && !includeBootstrapFilePath(); + return !toString().contains(".log") && !includeBootstrapFilePath(); } public PathWithBootstrapFileStatus getPathWithBootstrapFileStatus() {