Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7158

Use Tez auto-parallelism in Hive

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.14.0
    • None
    • None

    Description

      Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge.

      Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer.

      It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage.

      I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min.

      Attachments

        1. HIVE-7158.5.patch
          26 kB
          Gopal Vijayaraghavan
        2. HIVE-7158.4.patch
          25 kB
          Gunther Hagleitner
        3. HIVE-7158.3.patch
          25 kB
          Gunther Hagleitner
        4. HIVE-7158.2.patch
          39 kB
          Gunther Hagleitner
        5. HIVE-7158.1.patch
          40 kB
          Gunther Hagleitner

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            hagleitn Gunther Hagleitner Assign to me
            hagleitn Gunther Hagleitner
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment