Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-748

Shuffle output numbers of join may be inconsistent.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.8.0, 0.9.0
    • Fix Version/s: 0.8.0, 0.9.0
    • Component/s: Planner/Optimizer
    • Labels:
      None

      Description

      I found that inline view doesn't run expected at multiple join as follows:

      Environment

      • DataSet: TPC-DS
      • tajo.dist-query.join.broadcast.auto : false

      Case: 1

      SELECT COUNT(*)
      FROM (
        SELECT cs.cs_item_sk as cs_item_sk,
        cs.cs_ext_discount_amt as cs_ext_discount_amt
        FROM catalog_sales cs
        JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
        WHERE d.d_date between '2000-01-27' and '2000-04-27'
      ) cs1
      JOIN item i ON (i.i_item_sk = cs1.cs_item_sk);
      
      • actual result: 4163848
      • expected result: 4163848

      Case: 2

      select count(*)
      from item i
      JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
                                1.3 * avg(cs_ext_discount_amt) as avg_cs_ext_discount_amt
                 FROM (SELECT cs.cs_item_sk as cs_item_sk,
                                              cs.cs_ext_discount_amt as cs_ext_discount_amt
                              FROM catalog_sales cs
                              JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
                              WHERE d.d_date between '2000-01-27' and '2000-04-27') cs2
                              GROUP BY cs2.cs_item_sk) tmp1
      ON (i.i_item_sk = tmp1.cs_item_sk);
      
      • actual result: 102000
      • expected result: 102000

      Case: 3

      SELECT COUNT(*)
      FROM (SELECT cs.cs_item_sk as cs_item_sk,
                                   cs.cs_ext_discount_amt as cs_ext_discount_amt
                   FROM catalog_sales cs
                   JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
                   WHERE d.d_date between '2000-01-27' and '2000-04-27') cs1
      JOIN item i ON (i.i_item_sk = cs1.cs_item_sk)
      JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
                                1.3 * avg(cs_ext_discount_amt) as avg_cs_ext_discount_amt
                 FROM (SELECT cs.cs_item_sk as cs_item_sk,
                                              cs.cs_ext_discount_amt as cs_ext_discount_amt
                              FROM catalog_sales cs
                              JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
                              WHERE d.d_date between '2000-01-27' and '2000-04-27') cs2
                              GROUP BY cs2.cs_item_sk) tmp1
      ON (i.i_item_sk = tmp1.cs_item_sk)
      WHERE i.i_manufact_id = 436;
      
      • actual result: 80
      • expected result: 4586

      Case: 4

      SELECT COUNT(*)
      FROM (SELECT cs.cs_item_sk as cs_item_sk,
                                   cs.cs_ext_discount_amt as cs_ext_discount_amt
                   FROM catalog_sales cs
                   JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
                   WHERE d.d_date between '2000-01-27' and '2000-04-27') cs1
      JOIN item i ON (i.i_item_sk = cs1.cs_item_sk)
      JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
                                1.3 * avg(cs_ext_discount_amt) as avg_cs_ext_discount_amt
                 FROM (SELECT cs.cs_item_sk as cs_item_sk,
                                              cs.cs_ext_discount_amt as cs_ext_discount_amt
                              FROM catalog_sales cs
                              JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
                              WHERE d.d_date between '2000-01-27' and '2000-04-27') cs2
                              GROUP BY cs2.cs_item_sk) tmp1
      ON (i.i_item_sk = tmp1.cs_item_sk)
      
      • actual result: 71147
      • expected result: 4163848

      For reference, I made activated result using hive.

        Attachments

        1. TAJO-748.patch
          6 kB
          Jaehwa Jung
        2. TAJO-748_5.patch
          9 kB
          Jaehwa Jung
        3. TAJO-748_4.patch
          7 kB
          Jaehwa Jung
        4. TAJO-748_3.patch
          8 kB
          Jaehwa Jung
        5. TAJO-748_2.patch
          8 kB
          Jaehwa Jung

          Issue Links

            Activity

              People

              • Assignee:
                blrunner Jaehwa Jung
                Reporter:
                blrunner Jaehwa Jung
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: