Details
-
Improvement
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Hadoop copy operation is inefficient since it needs to stream the entirety of the resource through the machine performing the copy. Hadoop file system implementations do support an efficient rename.
Apache Beam sinks rely on being able to rename files atomically which is currently done by using FileSystem copy + delete.