Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1211

Staging directory for CTAS and INSERT should be in the output dir.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: QueryMaster
    • Labels:
      None

      Description

      Background

      Staging directory plays a role to keep the final output data temporarily. The final output data are moved toe the the final output dir if query is successfully finished. It is important to keep the output directory consistent even if query is failed.

      Problem

      Currently, staging directory is included /tmp/tajo-${user.name}/ in HDFS that ${tajo.root} uses. The final output directory and the staging directory can be on different file systems. In this case, the move will cause unnecessary copy overheads. In addition, in S3, such a move operation may be more problematic.

      Solution
      CTAS and INSERT (OVERWRITE) INTO should use the staging dir as a hidden subdirectory in the final output dir. For example, if the output dir is /table1, the corresponding staging dir should be /table1/.staging.

        Attachments

          Activity

            People

            • Assignee:
              hyunsik Hyunsik Choi
              Reporter:
              hyunsik Hyunsik Choi
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: