Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
1.8
-
None
-
None
-
None
Description
On upgrading from samza 1.6 to 1.8, we ran into an issue where our processors were unable to run because an exception is being thrown at startup:
Caused by: java.io.FileNotFoundException: /mnt/hdfs/hdfs01/ramdisk1/yarn/usercache/admin/appcache/application_1693863343588_1281/container_e84_1693863343588_1281_01_000017/state/session-store/Partition_3-1694473445147-703249/OFFSET-v2.tmp (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.<init>(FileOutputStream.java:213) at java.io.FileOutputStream.<init>(FileOutputStream.java:162) at org.apache.samza.util.FileUtil.writeWithChecksum(FileUtil.scala:45) at org.apache.samza.storage.StorageManagerUtil.writeOffsetFile(StorageManagerUtil.java:222) at org.apache.samza.storage.TaskStorageCommitManager.writeChangelogOffsetFile(TaskStorageCommitManager.java:364) at org.apache.samza.storage.TaskStorageCommitManager.lambda$writeChangelogOffsetFiles$10(TaskStorageCommitManager.java:340)
This is happening because the parent directory for the OFFSET-v2 file is never created before FileOutputStream is used to create the file.
The parent directory is defined by:
public String getStoreCheckpointDir(File taskStoreDir, CheckpointId checkpointId) { return taskStoreDir.getPath() + "-" + checkpointId.serialize(); }
Simple fix is to just add:
tmpFile.getParentFile.mkdirs()
To the writeWithChecksum method in FileUtil.scala.
Done in a forked feature branch here: https://github.com/jamesfotheringham/samza/tree/1.9.0_fileWriteFix