Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-2104 A CrossProductEdge which produces synthetic cross-product parallelism
  3. TEZ-3708

Improve parallelism and auto grouping of unpartitioned cartesian product

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0
    • None
    • None

    Description

      Current unpartitioned cartesian product has a few limitations
      1. parallelism can be not enough in case of large split and small # src task
      2. parallelism can be too much in in case of large # src task
      3. workload is not ideally distributed across the worker. Even with auto grouping, grouping by size may not be accurate because same size can means different #record and different cartesian product ops.

      Attachments

        1. TEZ-3708.4.patch
          213 kB
          Zhiyuan Yang
        2. TEZ-3708.3.patch
          202 kB
          Zhiyuan Yang
        3. TEZ-3708.2.patch
          105 kB
          Zhiyuan Yang
        4. TEZ-3708.1.patch
          105 kB
          Zhiyuan Yang

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            zhiyuany Zhiyuan Yang
            zhiyuany Zhiyuan Yang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment