Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-2956

Handle auto-reduce parallelism when the totalNumBipartiteSourceTasks is 0

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.7.1, 0.8.2
    • None
    • None
    • Reviewed

    Description

      In certain cases (e.g M --> R --> R), if the parent vertex has 0 tasks tez currently does not modify the parallelism factor in downstream.

      e.g

      SELECT ss_store_sk,
             ss_sold_date_sk,
             ss_quantity,
             ss_sales_price,
             LEAD(ss_sales_price, 1) OVER(PARTITION BY ss_store_sk
                                          ORDER BY ss_quantity)
      FROM store_sales
      WHERE ss_sold_date_sk IS NOT NULL
        AND ss_quantity IS NOT NULL
        AND ss_sales_price > 2857684
        AND ss_sales_price < 2857685
        AND ss_store_sk > 10234233423
        AND ss_store_sk < 20234234324
      ORDER BY ss_store_sk,
               ss_sold_date_sk;
      

      This would launch DAG "M1 (0) --> R2 (156) --> R3 (1)". However, R2 retains the parallelism of 156 even though no output would be generated in M1.

      Attachments

        1. TEZ-2956_DAG.png
          33 kB
          Rajesh Balamohan
        2. TEZ-2956.1.patch
          4 kB
          Rajesh Balamohan
        3. TEZ-2956.2.patch
          4 kB
          Rajesh Balamohan
        4. TEZ-2956.3.patch
          7 kB
          Bikas Saha
        5. With_Patch.png
          35 kB
          Rajesh Balamohan
        6. Without_Patch.png
          36 kB
          Rajesh Balamohan

        Activity

          People

            rajesh.balamohan Rajesh Balamohan
            rajesh.balamohan Rajesh Balamohan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: