Details
-
Type:
Bug
-
Status: Resolved
-
Priority:
Major
-
Resolution: Won't Fix
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
Description
FsShell copy/put creates staging files with '.COPYING' suffix. These files should be considered hidden by FileInputFormat. (A simple fix is to add the following conjunct to the existing hiddenFilter:
!name.endsWith("._COPYING_")
After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data loader which uses 'hadoop fs -put' to load data into hourly partitions. We also have intra-hourly jobs which are scheduled to execute several times per hour using the same hourly partition as input. Thus, as the new data is continuously loaded, these staging files (i.e., .COPYING) are breaking our jobs (since when copy/put completes staging files are moved).
As a workaround, we've defined a custom input path filter and loaded it with "mapred.input.pathFilter.class".
Attachments
Issue Links
- relates to
-
HADOOP-9750 '._COPYING_' sufix temp file could prevent from running MapReduce job
-
- Resolved
-