Hadoop Common
  1. Hadoop Common
  2. HADOOP-5889

Allow writing to output directories that exist, as long as they are empty

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.18.3
    • Fix Version/s: None
    • Component/s: fs
    • Labels:
      None

      Description

      The current behavior in FileOutputFormat.checkOutputSpecs is to fail if the path specified by mapred.output.dir exists at the start of the job. This is to protect from accidentally overwriting existing data. There seems no harm then in slightly relaxing this check to allow the case for the output to exist if it is an empty directory.

      At a minimum this would allow outputting to the root of S3N buckets, which is currently impossible (https://issues.apache.org/jira/browse/HADOOP-5805).

        Activity

        Hide
        Ian Nowland added a comment -

        Simple patch with additional check to not throw if existing directory is empty.

        Show
        Ian Nowland added a comment - Simple patch with additional check to not throw if existing directory is empty.
        Hide
        Tom White added a comment -

        This looks good to me. Would it be possible to have a unit test?

        Show
        Tom White added a comment - This looks good to me. Would it be possible to have a unit test?
        Hide
        Ian Nowland added a comment -

        Can do. One question I have is that there exists a org.apache.hadoop.mapred.TestFileOutputFormat but not a corresponding org.apache.hadoop.mapreduce.lib.output.TestFileOutputFormat. Should I copy over the existing test to the new location? Also, should I make my source change in both the old mapred as well as the newer mapreduce files?

        Show
        Ian Nowland added a comment - Can do. One question I have is that there exists a org.apache.hadoop.mapred.TestFileOutputFormat but not a corresponding org.apache.hadoop.mapreduce.lib.output.TestFileOutputFormat. Should I copy over the existing test to the new location? Also, should I make my source change in both the old mapred as well as the newer mapreduce files?
        Hide
        Tom White added a comment -

        I don't see why we wouldn't make this change to both old and new APIs.

        There is a precedent for having the same test for the old and new APIs (e.g. the one for LazyOutput), so yes, I would create a new org.apache.hadoop.mapreduce.lib.output.TestFileOutputFormat.

        Show
        Tom White added a comment - I don't see why we wouldn't make this change to both old and new APIs. There is a precedent for having the same test for the old and new APIs (e.g. the one for LazyOutput), so yes, I would create a new org.apache.hadoop.mapreduce.lib.output.TestFileOutputFormat.

          People

          • Assignee:
            Unassigned
            Reporter:
            Ian Nowland
          • Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development