Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.8.1
-
Patch Available
-
Moderate
Description
A common pattern in HDFS is for multiple segment files underneath a given directory, representing the fragments of data. Lots of tools understand to automatically merge these segment files (ie hadoop fs -getmerge, pig script loaders). This patch does the same for the HDFS consumer, using a temporary local directory for the merging.
Additionally, tools like pig and oozie understand to look for a _SUCCESS file in one of these directories containing segments. This file indicates that the segments have been completely written. This patch additionally skips the directory if a _SUCCESS file is not present.