Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-16574

StreamingFileSink should rename files or fail if destination file already exists

    XMLWordPrintableJSON

Details

    Description

      I switched from BucketingSink to StreamingFileSink so my state could not be restored after starting from a savepoint.

      Upon start of the job there were already part-0-0 and part-0-1 files in the HDFS destination folder. The StreamingFileSink then creates a file like .part-0-0.inprogress.d1849354-39d4-4634-8fb3-dfb8e2083857. When the file is rolled Flink tries to rename it to part-0-0, but that file already exists. NameNode logs "WARN hdfs.StateChange (FSDirRenameOp.java:unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo: failed to rename XXXX to XXXXbeca
      use destination exists".

      Flink does not care and creates a new file like .part-0-1.inprogress.d for the next bucket and the game continues until the part index counter is so high the file can be renamed. But now I'm left with a lot of .part-xxx.inprogress.xxx that I need to rename by hand if I don't want to lose the data.

       

      I would expect Flink to either fail if the file cannot be renamed, or auto-rename it to filename that does not exists yet.

      The same happens when not starting from a savepoint. IIRC the BucketingFileSink did not have this problem.

      Attachments

        Activity

          People

            Unassigned Unassigned
            melmoth static-max
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: