Uploaded image for project: 'Tajo (Retired)'
  1. Tajo (Retired)
  2. TAJO-748

Shuffle output numbers of join may be inconsistent.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 0.8.0, 0.9.0
    • 0.8.0, 0.9.0
    • Planner/Optimizer
    • None

    Description

      I found that inline view doesn't run expected at multiple join as follows:

      Environment

      • DataSet: TPC-DS
      • tajo.dist-query.join.broadcast.auto : false

      Case: 1

      SELECT COUNT(*)
      FROM (
        SELECT cs.cs_item_sk as cs_item_sk,
        cs.cs_ext_discount_amt as cs_ext_discount_amt
        FROM catalog_sales cs
        JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
        WHERE d.d_date between '2000-01-27' and '2000-04-27'
      ) cs1
      JOIN item i ON (i.i_item_sk = cs1.cs_item_sk);
      
      • actual result: 4163848
      • expected result: 4163848

      Case: 2

      select count(*)
      from item i
      JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
                                1.3 * avg(cs_ext_discount_amt) as avg_cs_ext_discount_amt
                 FROM (SELECT cs.cs_item_sk as cs_item_sk,
                                              cs.cs_ext_discount_amt as cs_ext_discount_amt
                              FROM catalog_sales cs
                              JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
                              WHERE d.d_date between '2000-01-27' and '2000-04-27') cs2
                              GROUP BY cs2.cs_item_sk) tmp1
      ON (i.i_item_sk = tmp1.cs_item_sk);
      
      • actual result: 102000
      • expected result: 102000

      Case: 3

      SELECT COUNT(*)
      FROM (SELECT cs.cs_item_sk as cs_item_sk,
                                   cs.cs_ext_discount_amt as cs_ext_discount_amt
                   FROM catalog_sales cs
                   JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
                   WHERE d.d_date between '2000-01-27' and '2000-04-27') cs1
      JOIN item i ON (i.i_item_sk = cs1.cs_item_sk)
      JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
                                1.3 * avg(cs_ext_discount_amt) as avg_cs_ext_discount_amt
                 FROM (SELECT cs.cs_item_sk as cs_item_sk,
                                              cs.cs_ext_discount_amt as cs_ext_discount_amt
                              FROM catalog_sales cs
                              JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
                              WHERE d.d_date between '2000-01-27' and '2000-04-27') cs2
                              GROUP BY cs2.cs_item_sk) tmp1
      ON (i.i_item_sk = tmp1.cs_item_sk)
      WHERE i.i_manufact_id = 436;
      
      • actual result: 80
      • expected result: 4586

      Case: 4

      SELECT COUNT(*)
      FROM (SELECT cs.cs_item_sk as cs_item_sk,
                                   cs.cs_ext_discount_amt as cs_ext_discount_amt
                   FROM catalog_sales cs
                   JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
                   WHERE d.d_date between '2000-01-27' and '2000-04-27') cs1
      JOIN item i ON (i.i_item_sk = cs1.cs_item_sk)
      JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
                                1.3 * avg(cs_ext_discount_amt) as avg_cs_ext_discount_amt
                 FROM (SELECT cs.cs_item_sk as cs_item_sk,
                                              cs.cs_ext_discount_amt as cs_ext_discount_amt
                              FROM catalog_sales cs
                              JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
                              WHERE d.d_date between '2000-01-27' and '2000-04-27') cs2
                              GROUP BY cs2.cs_item_sk) tmp1
      ON (i.i_item_sk = tmp1.cs_item_sk)
      
      • actual result: 71147
      • expected result: 4163848

      For reference, I made activated result using hive.

      Attachments

        1. TAJO-748.patch
          6 kB
          JaeHwa Jung
        2. TAJO-748_2.patch
          8 kB
          JaeHwa Jung
        3. TAJO-748_3.patch
          8 kB
          JaeHwa Jung
        4. TAJO-748_4.patch
          7 kB
          JaeHwa Jung
        5. TAJO-748_5.patch
          9 kB
          JaeHwa Jung

        Issue Links

          Activity

            People

              blrunner JaeHwa Jung
              blrunner JaeHwa Jung
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: