Uploaded image for project: 'Apache HAWQ'
  1. Apache HAWQ
  2. HAWQ-830

Wrong result in CTE query due to CTE is treated as init plan by planner and evaluated multiple times

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.0.0.0-incubating
    • backlog
    • Optimizer
    • None

    Description

      In CTE query, if the CTE itself is referenced multiple times, it should be evaluated only once and then be used multiple time. However, it is treated as init plan and evaluated multiple times in hawq 1.x and 2.0. This has two issues here:

      1. If the query in CTE is "volatile" (i.e., select volatile function) or has side effect (create/drop object in database), it may generate wrong result

      2. The performance of the query is not so efficient since the query in CTE is evaluated multiple times.

      Here is the steps to reproduce:
      1) in hawq, CTE is treated as init plan and evaluated 2 times. Thus, the result is incorrect

      WITH r AS (SELECT random())
      SELECT r1.*, r2.*
      FROM r AS r1, r AS r2;
            random       |      random
      -------------------+-------------------
       0.519145511090755 | 0.751198637764901
      (1 row)
      
      EXPLAIN
      WITH r AS (SELECT random())
      SELECT r1.*, r2.*
      FROM r AS r1, r AS r2;
                                QUERY PLAN
      --------------------------------------------------------------
       Nested Loop  (cost=0.04..0.77 rows=20 width=16)
         ->  Result  (cost=0.01..0.02 rows=1 width=0)
               InitPlan
                 ->  Result  (cost=0.00..0.01 rows=1 width=0)
         ->  Materialize  (cost=0.03..0.09 rows=6 width=8)
               ->  Result  (cost=0.01..0.02 rows=1 width=0)
                     InitPlan
                       ->  Result  (cost=0.00..0.01 rows=1 width=0)
       Settings:  default_hash_table_bucket_number=6
       Optimizer status: legacy query optimizer
      (10 rows)
      

      2) in postgres, CTE is treated as CTE scan and evaluated 1 time. Thus, the result is i

      WITH r AS (SELECT random())
      SELECT r1.*, r2.*
      FROM r AS r1, r AS r2;
            random       |      random
      -------------------+-------------------
       0.989214501809329 | 0.989214501809329
      (1 row)
      
      EXPLAIN
      WITH r AS (SELECT random())
      SELECT r1.*, r2.*
      FROM r AS r1, r AS r2;
                              QUERY PLAN
      ----------------------------------------------------------
       Nested Loop  (cost=0.01..0.06 rows=1 width=16)
         CTE r
           ->  Result  (cost=0.00..0.01 rows=1 width=0)
         ->  CTE Scan on r r1  (cost=0.00..0.02 rows=1 width=8)
         ->  CTE Scan on r r2  (cost=0.00..0.02 rows=1 width=8)
      (5 rows)

      Attachments

        Issue Links

          Activity

            People

              huor Ruilong Huo
              huor Ruilong Huo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: