Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-1486

Ability to configure a staging directory for data

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • Sinks+Sources
    • None

    Description

      It would be nice to be able to configure a staging directory for files being written to HDFS. Once the file stream is complete the file would then be moved to the configured "final" directory.

      One example use case where this helps is with log files which are being analyzed by Hive. We could have a Hive table that points to HDFS folder which contains a bunch of log files. As it stands, if flume is writing a tmp file into that directory, and you fire up a MapReduce job, and that file is finished being written to (thus changing the filename) than the job will fail because it can't find that job.

      The current workaround is to use virtual columns to not look at TMP files, but this tedious to do for every query. It would be nice to be able to have a directory Flume can write the files into, once it finishes streaming data to a job and closes the file for writing, it can move it to the final directory.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rickysaltzer Ricky Saltzer
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: