Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4450

random query generator: prefer || for postgres string concatenation

    Details

      Description

      If Impala CONCAT() has an argument that evaluates to NULL, then Impala
      CONCAT() returns NULL.

      If PostgreSQL CONCAT() has an argument that evaluates to NULL, then
      PostgreSQL will ignore it, and treat it as an empty string. Even
      CONCAT(NULL) evaluates to the empty string.

      PostgreSQL has a || operator that behaves like Impala CONCAT(): if a
      NULL expression is on one side of ||, that || evals to NULL

      For the Postgres SQL writer, we need to write logical concatenation as the || operator, not the CONCAT() function.

        Activity

        Hide
        mikesbrown Michael Brown added a comment -

        Note the query generator found this when doing a GROUP BY on an expression containing a CONCAT. There were row count differences, because Impala had a group for NULL, whereas PostgreSQL grouped the NULL and empty string results in the same group.

        Show
        mikesbrown Michael Brown added a comment - Note the query generator found this when doing a GROUP BY on an expression containing a CONCAT . There were row count differences, because Impala had a group for NULL, whereas PostgreSQL grouped the NULL and empty string results in the same group.
        Hide
        mikesbrown Michael Brown added a comment -

        Testing a fix. I have to actually run the query generator to do that at this point.

        Show
        mikesbrown Michael Brown added a comment - Testing a fix. I have to actually run the query generator to do that at this point.
        Hide
        mikesbrown Michael Brown added a comment -
        commit 585ed5aaae32607218e8056e34b40de0e24c1724
        Author: Michael Brown <mikeb@cloudera.com>
        Date:   Wed Nov 9 16:07:37 2016 -0800
        
            IMPALA-4450: qgen: use string concatenation operator for postgres queries
        
            The random query generator writes a logical query Python object into
            Impala or PostgreSQL dialects. When the CONCAT() function is chosen,
            Impala's and PostgreSQL's CONCAT() implementations behave differently.
            However, PostgreSQL has a || operator that functions like Impala's
            CONCAT().
        
            The method added here overrides the default behavior for the
            PostgresqlSqlWriter. It prevents CONCAT(arg1, arg2, ..., argN) from
            being written and instead causes the SQL to be written as
            'arg1 || arg2 || ... || argN'.
        
            Testing:
        
            I made sure that we generate syntactically valid queries still on the
            PostgreSQL side. This includes queries that made use of string
            concatenation. I also re-ran some failed queries that previously
            produced different results. They now produce the same results. This is a
            very straightforward change, so unit or functional tests for this seem
            overkill.
        
            The full effects of using || instead of CONCAT() are hard to test. It's
            not clear if in my manual testing of || vs. CONCAT() that I missed some
            edge behavior, especially in some complicated query, nested expressions,
            GROUPing BY, and so on.
        
            Change-Id: I149b695889addfd7df4ca5f40dc991456da51687
            Reviewed-on: http://gerrit.cloudera.org:8080/5034
            Reviewed-by: Michael Brown <mikeb@cloudera.com>
            Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
            Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
            Tested-by: Sailesh Mukil <sailesh@cloudera.com>
        
        Show
        mikesbrown Michael Brown added a comment - commit 585ed5aaae32607218e8056e34b40de0e24c1724 Author: Michael Brown <mikeb@cloudera.com> Date: Wed Nov 9 16:07:37 2016 -0800 IMPALA-4450: qgen: use string concatenation operator for postgres queries The random query generator writes a logical query Python object into Impala or PostgreSQL dialects. When the CONCAT() function is chosen, Impala's and PostgreSQL's CONCAT() implementations behave differently. However, PostgreSQL has a || operator that functions like Impala's CONCAT(). The method added here overrides the default behavior for the PostgresqlSqlWriter. It prevents CONCAT(arg1, arg2, ..., argN) from being written and instead causes the SQL to be written as 'arg1 || arg2 || ... || argN'. Testing: I made sure that we generate syntactically valid queries still on the PostgreSQL side. This includes queries that made use of string concatenation. I also re-ran some failed queries that previously produced different results. They now produce the same results. This is a very straightforward change, so unit or functional tests for this seem overkill. The full effects of using || instead of CONCAT() are hard to test. It's not clear if in my manual testing of || vs. CONCAT() that I missed some edge behavior, especially in some complicated query, nested expressions, GROUPing BY, and so on. Change-Id: I149b695889addfd7df4ca5f40dc991456da51687 Reviewed-on: http://gerrit.cloudera.org:8080/5034 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Sailesh Mukil <sailesh@cloudera.com>

          People

          • Assignee:
            mikesbrown Michael Brown
            Reporter:
            mikesbrown Michael Brown
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development