Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5247

FileInputFormat should filter files with '._COPYING_' sufix

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      FsShell copy/put creates staging files with '.COPYING' suffix. These files should be considered hidden by FileInputFormat. (A simple fix is to add the following conjunct to the existing hiddenFilter:

      !name.endsWith("._COPYING_")
      

      After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data loader which uses 'hadoop fs -put' to load data into hourly partitions. We also have intra-hourly jobs which are scheduled to execute several times per hour using the same hourly partition as input. Thus, as the new data is continuously loaded, these staging files (i.e., .COPYING) are breaking our jobs (since when copy/put completes staging files are moved).

      As a workaround, we've defined a custom input path filter and loaded it with "mapred.input.pathFilter.class".

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Unassigned
              Reporter:
              Stan Rosenberg
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development