Uploaded image for project: 'Apache HAWQ'
  1. Apache HAWQ
  2. HAWQ-800

Less tuple is inserted due to data locality information is not refreshed and dispatched in prepared statement

    XMLWordPrintableJSON

    Details

      Description

      In either explicit (SQL) or implicit (UDF) prepared statement, there is less tuple inserted if we run a prepared "insert into t select * from t" plan multiple times in a transaction.

      Below is a simple case to reproduce this issue. For a more complicated example, you may refer to attached proba_execute and proba.
      1. There should be 8 tuples, however there is only 4 in hawq 2.0

      drop table if exists t;
      DROP TABLE
      
      create table t (id int);
      CREATE TABLE
      
      insert into t values (1);
      INSERT 0 1
      
      CREATE OR REPLACE FUNCTION f_load()
      RETURNS TEXT
      LANGUAGE plpgsql
      AS
      $body$
      DECLARE
          l_rec RECORD;
          l_itm RECORD;
      BEGIN
      
          FOR l_rec IN ( SELECT generate_series(1, 3) AS id )
          LOOP
              INSERT INTO t SELECT * FROM t;
          END LOOP;
      
          RETURN 'done';
      
      END;
      $body$
      ;
      CREATE FUNCTION
      
      SELECT f_load();
       f_load
      --------
       done
      (1 row)
      
      SELECT * FROM t;
       id
      ----
        1
        1
        1
        1
      (4 rows)
      

      2. There are 8 tuples as expected in hawq 1.x

      drop table if exists t;
      DROP TABLE
      
      create table t (id int);
      psql:temp.sql:3: NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.
      HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
      CREATE TABLE
      
      insert into t values (1);
      INSERT 0 1
      
      CREATE OR REPLACE FUNCTION f_load()
      RETURNS TEXT
      LANGUAGE plpgsql
      AS
      $body$
      DECLARE
          l_rec RECORD;
          l_itm RECORD;
      
      BEGIN
      
          FOR l_rec IN ( SELECT generate_series(1, 3) AS id )
          LOOP
              INSERT INTO t SELECT * FROM t;
          END LOOP;
      
          RETURN 'done';
      
      END;
      $body$
      ;
      CREATE FUNCTION
      
      SELECT f_load();
       f_load
      --------
       done
      (1 row)
      
      SELECT * FROM t;
       id
      ----
        1
        1
        1
        1
        1
        1
        1
        1
      (8 rows)
      

        Attachments

        1. proba.sql
          2 kB
          Ruilong Huo
        2. proba.out
          3 kB
          Ruilong Huo
        3. proba_execute.sql
          2 kB
          Ruilong Huo
        4. proba_execute.out
          3 kB
          Ruilong Huo

          Issue Links

          There are no Sub-Tasks for this issue.

            Activity

              People

              • Assignee:
                huor Ruilong Huo
                Reporter:
                huor Ruilong Huo
              • Votes:
                2 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: