Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Background
Staging directory plays a role to keep the final output data temporarily. The final output data are moved toe the the final output dir if query is successfully finished. It is important to keep the output directory consistent even if query is failed.
Problem
Currently, staging directory is included /tmp/tajo-${user.name}/ in HDFS that ${tajo.root} uses. The final output directory and the staging directory can be on different file systems. In this case, the move will cause unnecessary copy overheads. In addition, in S3, such a move operation may be more problematic.
Solution
CTAS and INSERT (OVERWRITE) INTO should use the staging dir as a hidden subdirectory in the final output dir. For example, if the output dir is /table1, the corresponding staging dir should be /table1/.staging.