Uploaded image for project: 'Apache HAWQ'
  1. Apache HAWQ
  2. HAWQ-800

Less tuple is inserted due to data locality information is not refreshed and dispatched in prepared statement

    XMLWordPrintableJSON

Details

    Description

      In either explicit (SQL) or implicit (UDF) prepared statement, there is less tuple inserted if we run a prepared "insert into t select * from t" plan multiple times in a transaction.

      Below is a simple case to reproduce this issue. For a more complicated example, you may refer to attached proba_execute and proba.
      1. There should be 8 tuples, however there is only 4 in hawq 2.0

      drop table if exists t;
      DROP TABLE
      
      create table t (id int);
      CREATE TABLE
      
      insert into t values (1);
      INSERT 0 1
      
      CREATE OR REPLACE FUNCTION f_load()
      RETURNS TEXT
      LANGUAGE plpgsql
      AS
      $body$
      DECLARE
          l_rec RECORD;
          l_itm RECORD;
      BEGIN
      
          FOR l_rec IN ( SELECT generate_series(1, 3) AS id )
          LOOP
              INSERT INTO t SELECT * FROM t;
          END LOOP;
      
          RETURN 'done';
      
      END;
      $body$
      ;
      CREATE FUNCTION
      
      SELECT f_load();
       f_load
      --------
       done
      (1 row)
      
      SELECT * FROM t;
       id
      ----
        1
        1
        1
        1
      (4 rows)
      

      2. There are 8 tuples as expected in hawq 1.x

      drop table if exists t;
      DROP TABLE
      
      create table t (id int);
      psql:temp.sql:3: NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.
      HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
      CREATE TABLE
      
      insert into t values (1);
      INSERT 0 1
      
      CREATE OR REPLACE FUNCTION f_load()
      RETURNS TEXT
      LANGUAGE plpgsql
      AS
      $body$
      DECLARE
          l_rec RECORD;
          l_itm RECORD;
      
      BEGIN
      
          FOR l_rec IN ( SELECT generate_series(1, 3) AS id )
          LOOP
              INSERT INTO t SELECT * FROM t;
          END LOOP;
      
          RETURN 'done';
      
      END;
      $body$
      ;
      CREATE FUNCTION
      
      SELECT f_load();
       f_load
      --------
       done
      (1 row)
      
      SELECT * FROM t;
       id
      ----
        1
        1
        1
        1
        1
        1
        1
        1
      (8 rows)
      

      Attachments

        1. proba.out
          3 kB
          Ruilong Huo
        2. proba.sql
          2 kB
          Ruilong Huo
        3. proba_execute.out
          3 kB
          Ruilong Huo
        4. proba_execute.sql
          2 kB
          Ruilong Huo

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              huor Ruilong Huo
              huor Ruilong Huo
              Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: