Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-2104 A CrossProductEdge which produces synthetic cross-product parallelism
  3. TEZ-3708

Improve parallelism and auto grouping of unpartitioned cartesian product

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0
    • None
    • None

    Description

      Current unpartitioned cartesian product has a few limitations
      1. parallelism can be not enough in case of large split and small # src task
      2. parallelism can be too much in in case of large # src task
      3. workload is not ideally distributed across the worker. Even with auto grouping, grouping by size may not be accurate because same size can means different #record and different cartesian product ops.

      Attachments

        1. TEZ-3708.4.patch
          213 kB
          Zhiyuan Yang
        2. TEZ-3708.3.patch
          202 kB
          Zhiyuan Yang
        3. TEZ-3708.2.patch
          105 kB
          Zhiyuan Yang
        4. TEZ-3708.1.patch
          105 kB
          Zhiyuan Yang

        Issue Links

          Activity

            People

              zhiyuany Zhiyuan Yang
              zhiyuany Zhiyuan Yang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: