Uploaded image for project: 'Apache HAWQ'
  1. Apache HAWQ
  2. HAWQ-884

Subquery scan return no tuple in query with CTE

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.0.0.0-incubating
    • backlog
    • Query Execution
    • None

    Description

      Here is the CTE query that return no tuple, while it should return 1 tuple actually.

      WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ), 
            t2 AS ( SELECT 3 c3 )
      SELECT * FROM t1
      WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);                                                                                                                                                                                            
      
      -- Actual
      c1 | c2
      ----+----
      (0 rows)
      
      -- Expected
       c1 | c2
      ----+----
        3 |  4
      (1 row)
      

      The root cause is that during query planning, it is correct that t2 in the CTE (common table expression) clause is treated as subquery scan and then materialized.
      However, during query execution, it generate no tuple when t2 is evaluated. Thus, the join of t1 with t2 generate no tuple.
      We can see this in query execution statistics using explain analyze while running the query with optimizer = off.

         ->  Materialize  (cost=0.00..0.01 rows=1 width=0)
               Rows out:  0 rows with 0.167 ms to end of 3 scans, start offset by 0.265 ms.
               ->  Limit  (cost=0.00..0.00 rows=1 width=0)
                     Rows out:  0 rows with 0.003 ms to end, start offset by 0.257 ms.
                     ->  Subquery Scan t2  (cost=0.00..0.01 rows=1 width=0)
                           Rows out:  0 rows with 0.002 ms to end, start offset by 0.258 ms.
                           ->  Result  (cost=0.00..0.01 rows=1 width=0)
                                 One-Time Filter: 3 = $0
                                 Rows out:  0 rows with 0.001 ms to end, start offset by 0.258 ms.
      

      Here is details:
      1) hawq 2.0 with optimizer off (planner): subquery scan generate no tuple

      show optimizer;
       optimizer
      -----------
       off
      (1 row)
      
      WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
           t2 AS ( SELECT 3 c3 )
      SELECT * FROM t1
      WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
       c1 | c2
      ----+----
      (0 rows)
      
      EXPLAIN ANALYZE
      WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
           t2 AS ( SELECT 3 c3 )
      SELECT * FROM t1
      WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
                                                                                  QUERY PLAN
      -------------------------------------------------------------------------------------------------------------------------------------------------------------------
       Nested Loop  (cost=0.05..0.29 rows=72 width=8)
         Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.237/0.237 ms to end.
         ->  Limit  (cost=0.00..0.00 rows=1 width=0)
               Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.003/0.003 ms to end.
               ->  Subquery Scan t2  (cost=0.00..0.01 rows=6 width=0)
                     Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.002/0.002 ms to end.
                     ->  Result  (cost=0.00..0.01 rows=1 width=0)
                           One-Time Filter: 3 = $0
                           Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0/0 ms to end.
         ->  Materialize  (cost=0.05..0.17 rows=12 width=8)
               Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.129/0.129 ms to end, start offset by 0.135/0.135 ms.
               ->  Append  (cost=0.00..0.04 rows=2 width=0)
                     Rows out:  Avg 2.0 rows x 1 workers.  Max/Last(/) 2/2 rows with 0.002/0.002 ms to first row, 0.004/0.004 ms to end, start offset by 0.255/0.255 ms.
                     ->  Result  (cost=0.00..0.01 rows=1 width=0)
                           Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.002/0.002 ms to end, start offset by 0.255/0.255 ms.
                     ->  Result  (cost=0.00..0.01 rows=1 width=0)
                           Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0/0 ms to end, start offset by 0.261/0.261 ms.
       Slice statistics:
         (slice0)    Executor memory: 61K bytes.
       Statement statistics:
         Memory used: 128000K bytes
       Settings:  default_hash_table_bucket_number=6; optimizer=off
       Optimizer status: legacy query optimizer
       Data locality statistics:
         no data locality information in this query
       Total runtime: 0.372 ms
      (26 rows)
      

      2) hawq 2.0 with optimizer on (orca): fallback to planner and thus subquery scan generate no tuple

      show optimizer;
       optimizer
      -----------
       on
      (1 row)
      
      WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
           t2 AS ( SELECT 3 c3 )
      SELECT * FROM t1
      WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
       c1 | c2
      ----+----
      (0 rows)
      
      EXPLAIN ANALYZE
      WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
           t2 AS ( SELECT 3 c3 )
      SELECT * FROM t1
      WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
                                                                                  QUERY PLAN
      -------------------------------------------------------------------------------------------------------------------------------------------------------------------
       Nested Loop  (cost=0.05..0.29 rows=72 width=8)
         Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.273/0.273 ms to end.
         ->  Limit  (cost=0.00..0.00 rows=1 width=0)
               Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.003/0.003 ms to end.
               ->  Subquery Scan t2  (cost=0.00..0.01 rows=6 width=0)
                     Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.002/0.002 ms to end.
                     ->  Result  (cost=0.00..0.01 rows=1 width=0)
                           One-Time Filter: 3 = $0
                           Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.001/0.001 ms to end.
         ->  Materialize  (cost=0.05..0.17 rows=12 width=8)
               Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.151/0.151 ms to end, start offset by 0.189/0.189 ms.
               ->  Append  (cost=0.00..0.04 rows=2 width=0)
                     Rows out:  Avg 2.0 rows x 1 workers.  Max/Last(/) 2/2 rows with 0.003/0.003 ms to first row, 0.004/0.004 ms to end, start offset by 0.327/0.327 ms.
                     ->  Result  (cost=0.00..0.01 rows=1 width=0)
                           Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.002/0.002 ms to end, start offset by 0.327/0.327 ms.
                     ->  Result  (cost=0.00..0.01 rows=1 width=0)
                           Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0/0 ms to end, start offset by 0.337/0.337 ms.
       Slice statistics:
         (slice0)    Executor memory: 61K bytes.
       Statement statistics:
         Memory used: 128000K bytes
       Settings:  default_hash_table_bucket_number=6
       Optimizer status: legacy query optimizer
       Data locality statistics:
         no data locality information in this query
       Total runtime: 0.468 ms
      (26 rows)
      

      3) hawq 1.x with optimizer off (planner): subquery scan generate no tuple

      show optimizer;
       optimizer
      -----------
       off
      (1 row)
      
      WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
           t2 AS ( SELECT 3 c3 )
      SELECT * FROM t1
      WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
       c1 | c2
      ----+----
      (0 rows)
      
      EXPLAIN ANALYZE
      WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
           t2 AS ( SELECT 3 c3 )
      SELECT * FROM t1
      WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
                                                  QUERY PLAN
      --------------------------------------------------------------------------------------------------
       Nested Loop  (cost=0.00..0.08 rows=4 width=8)
         Rows out:  0 rows with 0.220 ms to end, start offset by 0.093 ms.
         ->  Append  (cost=0.00..0.04 rows=2 width=0)
               Rows out:  2 rows with 0.001 ms to first row, 0.003 ms to end, start offset by 0.262 ms.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  1 rows with 0.001 ms to end, start offset by 0.262 ms.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  1 rows with 0.001 ms to end, start offset by 0.264 ms.
         ->  Materialize  (cost=0.00..0.01 rows=1 width=0)
               Rows out:  0 rows with 0.167 ms to end of 3 scans, start offset by 0.265 ms.
               ->  Limit  (cost=0.00..0.00 rows=1 width=0)
                     Rows out:  0 rows with 0.003 ms to end, start offset by 0.257 ms.
                     ->  Subquery Scan t2  (cost=0.00..0.01 rows=1 width=0)
                           Rows out:  0 rows with 0.002 ms to end, start offset by 0.258 ms.
                           ->  Result  (cost=0.00..0.01 rows=1 width=0)
                                 One-Time Filter: 3 = $0
                                 Rows out:  0 rows with 0.001 ms to end, start offset by 0.258 ms.
       Slice statistics:
         (slice0)    Executor memory: 61K bytes.
       Statement statistics:
         Memory used: 128000K bytes
       Settings:  optimizer=off
       Optimizer status: legacy query optimizer
       Total runtime: 0.315 ms
      (24 rows)
      

      4) hawq 1.x with optimizer on (orca): work as expected

      show optimizer;
       optimizer
      -----------
       on
      (1 row)
      
      WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
           t2 AS ( SELECT 3 c3 )
      SELECT * FROM t1
      WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
       c1 | c2
      ----+----
        3 |  4
      (1 row)
      
      EXPLAIN ANALYZE
      WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
           t2 AS ( SELECT 3 c3 )
      SELECT * FROM t1
      WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
                                                     QUERY PLAN
      --------------------------------------------------------------------------------------------------------
       Hash EXISTS Join  (cost=0.00..0.00 rows=2 width=8)
         Hash Cond: "outer".c1 = "inner".c3
         Rows out:  1 rows with 0.835 ms to first row, 2.373 ms to end, start offset by 0.194 ms.
         Executor memory:  1K bytes.
         Work_mem used:  1K bytes. Workfile: (0 spilling, 0 reused)
         Hash chain length 1.0 avg, 1 max, using 1 of 524341 buckets.
         ->  Append  (cost=0.00..0.00 rows=2 width=8)
               Rows out:  2 rows with 0.002 ms to first row, 0.004 ms to end, start offset by 1.023 ms.
               ->  Result  (cost=0.00..0.00 rows=1 width=8)
                     Rows out:  1 rows with 0.001 ms to first row, 0.002 ms to end, start offset by 1.023 ms.
                     ->  Result  (cost=0.00..0.00 rows=1 width=1)
                           Rows out:  1 rows with 0 ms to end, start offset by 1.024 ms.
               ->  Result  (cost=0.00..0.00 rows=1 width=8)
                     Rows out:  1 rows with 0.001 ms to end, start offset by 1.026 ms.
                     ->  Result  (cost=0.00..0.00 rows=1 width=1)
                           Rows out:  1 rows with 0 ms to end, start offset by 1.026 ms.
         ->  Hash  (cost=0.00..0.00 rows=1 width=4)
               Rows in:  1 rows with 0.010 ms to end, start offset by 1.013 ms.
               ->  Result  (cost=0.00..0.00 rows=1 width=4)
                     Rows out:  1 rows with 0.006 ms to first row, 0.007 ms to end, start offset by 1.013 ms.
                     ->  Result  (cost=0.00..0.00 rows=1 width=1)
                           Rows out:  1 rows with 0.001 ms to end, start offset by 1.014 ms.
       Slice statistics:
         (slice0)    Executor memory: 8270K bytes.  Work_mem: 1K bytes max.
       Statement statistics:
         Memory used: 128000K bytes
       Optimizer status: PQO version 1.591
       Total runtime: 2.572 ms
      (28 rows)
      

      Attachments

        Issue Links

          Activity

            People

              huor Ruilong Huo
              huor Ruilong Huo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: