Uploaded image for project: 'Camel'
  1. Camel
  2. CAMEL-6867

camel-hdfs - HdfsProducer filename collisions when Producer instance recreated

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.11.3, 2.12.2, 2.13.0
    • camel-hdfs
    • None
    • Unknown

    Description

      The HdfsProducer uses an instance variable (long splitNum) that is incremented to create unique output filenames in a given directory (seg0, seg1, etc).

      If the Producer instance is recreated (producer cache limit exceeded, server restart, etc), the splitNum variable is reset to 0. This results in files being overwritten when using overwrite=true mode or throwing "The file already exists" errors when using overwrite=false mode.

      We should switch to using a timestamp or some other unique generator to prevent filename collisions regardless of the Producer instance lifecycle for the same hdfs directory URL...

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            boday Benjamin P. O'Day
            boday Benjamin P. O'Day
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment