Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.3.1
-
None
-
None
Description
seen in production: calling mkdirs in FileOutputCommitter setupJob is triggering an FNFE
java.io.FileNotFoundException: Operation failed: "The specified path does not exist.", 404, PUT, https://bcket.dfs.core.windows.net/table1/_temporary/0?resource=directory&timeout=90, PathNotFound, "The specified path does not exist." at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:1131) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.mkdirs(AzureBlobFileSystem.java:445) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2347)
I suspect what is happening is that while this job is setting up, a previous job is doing cleanup/abort on the same path
assuming that abfs mkdirs is like the posix one -nonatomic, as it goes up/down the chain of parent dirs, something else gets in the way.
if so, this is something which can be handled in the client -when we get an FNFE we could warn and retry.
in the manifest committer each job will have a unique id under _temporary and there will be the option to skip deleting the temp dir entirely, for better coexistence of active jobs.