Description
I've been playing with the S3 integration. My first attempts to use it are actually as a drop-in replacement for a backup job, streaming data offsite by piping the backup job output to a "hadoop dfs -put - targetfile".
If enough errors occur posting to S3 (this happened easily last Thursday, during an S3 growth issue), the write can eventually fail. At that point, there are both blocks and a partial INode written into S3. Doing a "hadoop dfs -ls filename" shows the file, it has a non-zero size, etc. However, trying to "hadoop dfs -rm filename" a failed-written file results in the response "rm: No such file or directory."