Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14269 Performance optimizations for data on S3
  3. HIVE-15114

Remove extra MoveTask operators from the ConditionalTask

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.3.0
    • Hive
    • None


      When running simple insert queries (e.g. INSERT INTO TABLE ... VALUES ...) there an extraneous {{MoveTask}s is created.

      This is problematic when the scratch directory is on S3 since renames require copying the entire dataset.

      For simple queries (like the one above), there are two MoveTasks. The first one moves the output data from one file in the scratch directory to another file in the scratch directory. The second MoveTask moves the data from the scratch directory to its final table location.

      The first MoveTask should not be necessary. The goal of this JIRA it to remove it. This should help improve performance when running on S3.

      It seems that the first Move might be caused by a dependency resolution problem in the optimizer, where a dependent task doesn't get properly removed when the task it depends on is filtered by a condition resolver.

      A dummy MoveTask is added in the GenMapRedUtils.createMRWorkForMergingFiles method. This method creates a conditional task which launches a job to merge tasks at the end of the file. At the end of the conditional job there is a MoveTask.

      Even though Hive decides that the conditional merge job is no needed, it seems the MoveTask is still added to the plan.

      Seems this extra MoveTask may have been added intentionally. Not sure why yet. The ConditionalResolverMergeFiles says that one of three tasks will be returned: move task only, merge task only, merge task followed by a move task.


        1. HIVE-15114.3.patch
          27 kB
          Sergio Peña
        2. HIVE-15114.4.patch
          40 kB
          Sergio Peña
        3. HIVE-15114.5.patch
          63 kB
          Sergio Peña
        4. HIVE-15114.6.patch
          108 kB
          Sergio Peña
        5. HIVE-15114.WIP.1.patch
          15 kB
          Sergio Peña
        6. HIVE-15114.WIP.2.patch
          27 kB
          Sergio Peña


          This comment will be Viewable by All Users Viewable by All Users


            spena Sergio Peña Assign to me
            stakiar Sahil Takiar
            0 Vote for this issue
            8 Start watching this issue




                Issue deployment