Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-2104 A CrossProductEdge which produces synthetic cross-product parallelism
  3. TEZ-3708

Improve parallelism and auto grouping of unpartitioned cartesian product

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:
      None

      Description

      Current unpartitioned cartesian product has a few limitations
      1. parallelism can be not enough in case of large split and small # src task
      2. parallelism can be too much in in case of large # src task
      3. workload is not ideally distributed across the worker. Even with auto grouping, grouping by size may not be accurate because same size can means different #record and different cartesian product ops.

        Attachments

        1. TEZ-3708.1.patch
          105 kB
          Zhiyuan Yang
        2. TEZ-3708.2.patch
          105 kB
          Zhiyuan Yang
        3. TEZ-3708.3.patch
          202 kB
          Zhiyuan Yang
        4. TEZ-3708.4.patch
          213 kB
          Zhiyuan Yang

          Issue Links

            Activity

              People

              • Assignee:
                zhiyuany Zhiyuan Yang
                Reporter:
                zhiyuany Zhiyuan Yang
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: