Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4518

behavior with LEAD/LAG on string column/expression where offset goes beyond row bounds differs from postgres

    Details

      Description

      When a LEAD or LAG on an int, timestamp, or bool column or expression has an offset that exceeds the row index, the evaluated expression returns NULL. This matches what happens with Postgres.

      However, on a string column, whereas Postgres still returns NULL, Impala returns the empty string.

      [localhost:21000] > use functional;
      Query: use functional
      [localhost:21000] > select
                        > id,
                        > lead(id, 3) over (order by id) lead_id,
                        > string_col,
                        > lead(string_col, 3) over (order by id) lead_string_col
                        > from alltypestiny;
      Query: select id,
      lead(id, 3) over (order by id) lead_id,
      string_col,
      lead(string_col, 3) over (order by id) lead_string_col
      from alltypestiny
      +----+---------+------------+-----------------+
      | id | lead_id | string_col | lead_string_col |
      +----+---------+------------+-----------------+
      | 0  | 3       | 0          | 1               |
      | 1  | 4       | 1          | 0               |
      | 2  | 5       | 0          | 1               |
      | 3  | 6       | 1          | 0               |
      | 4  | 7       | 0          | 1               |
      | 5  | NULL    | 1          |                 |
      | 6  | NULL    | 0          |                 |
      | 7  | NULL    | 1          |                 |
      +----+---------+------------+-----------------+
      Fetched 8 row(s) in 0.12s
      [localhost:21000] >
      

      This happens even if the optional "default value" parameter is strictly set to NULL:

      [localhost:21000] > select lead(string_col, 99999999, null) over (order by id) col from alltypestiny;
      +-----+
      | col |
      +-----+
      |     |
      |     |
      |     |
      |     |
      |     |
      |     |
      |     |
      |     |
      +-----+
      Fetched 8 row(s) in 0.12s
      [localhost:21000] >
      

        Issue Links

          Activity

          Hide
          mjacobs Matthew Jacobs added a comment -

          Is this a regression?

          Show
          mjacobs Matthew Jacobs added a comment - Is this a regression?
          Hide
          mikesbrown Michael Brown added a comment -

          I'll check with Impala 2.7 to understand whether this is a regression.

          Show
          mikesbrown Michael Brown added a comment - I'll check with Impala 2.7 to understand whether this is a regression.
          Hide
          mikesbrown Michael Brown added a comment -

          I just checked 2.7 behavior:

          Query: select id,
          lead(id, 3) over (order by id) lead_id,
          string_col,
          lead(string_col, 3) over (order by id) lead_string_col
          from alltypestiny
          +----+---------+------------+-----------------+
          | id | lead_id | string_col | lead_string_col |
          +----+---------+------------+-----------------+
          | 0  | 3       | 0          | 1               |
          | 1  | 4       | 1          | 0               |
          | 2  | 5       | 0          | 1               |
          | 3  | 6       | 1          | 0               |
          | 4  | 7       | 0          | 1               |
          | 5  | NULL    | 1          | NULL            |
          | 6  | NULL    | 0          | NULL            |
          | 7  | NULL    | 1          | NULL            |
          +----+---------+------------+-----------------+
          Fetched 8 row(s) in 5.08s
          [localhost:21000] > select lead(string_col, 99999999, null) over (order by id) col from alltypestiny;
          +------+
          | col  |
          +------+
          | NULL |
          | NULL |
          | NULL |
          | NULL |
          | NULL |
          | NULL |
          | NULL |
          | NULL |
          +------+
          Fetched 8 row(s) in 0.18s
          [localhost:21000] >
          
          Show
          mikesbrown Michael Brown added a comment - I just checked 2.7 behavior: Query: select id, lead(id, 3) over (order by id) lead_id, string_col, lead(string_col, 3) over (order by id) lead_string_col from alltypestiny +----+---------+------------+-----------------+ | id | lead_id | string_col | lead_string_col | +----+---------+------------+-----------------+ | 0 | 3 | 0 | 1 | | 1 | 4 | 1 | 0 | | 2 | 5 | 0 | 1 | | 3 | 6 | 1 | 0 | | 4 | 7 | 0 | 1 | | 5 | NULL | 1 | NULL | | 6 | NULL | 0 | NULL | | 7 | NULL | 1 | NULL | +----+---------+------------+-----------------+ Fetched 8 row(s) in 5.08s [localhost:21000] > select lead(string_col, 99999999, null) over (order by id) col from alltypestiny; +------+ | col | +------+ | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | +------+ Fetched 8 row(s) in 0.18s [localhost:21000] >
          Hide
          mjacobs Matthew Jacobs added a comment -

          Thanks Michael Brown. Michael Ho this is most likely related to 51268c053ffe41dc1aa9f1b250878113d4225258. Will you be able to look at it?

          Show
          mjacobs Matthew Jacobs added a comment - Thanks Michael Brown . Michael Ho this is most likely related to 51268c053ffe41dc1aa9f1b250878113d4225258. Will you be able to look at it?
          Hide
          kwho Michael Ho added a comment -

          Looking at it now.

          Show
          kwho Michael Ho added a comment - Looking at it now.
          Hide
          kwho Michael Ho added a comment -

          Root cause found. Testing a fix now.

          Show
          kwho Michael Ho added a comment - Root cause found. Testing a fix now.
          Hide
          kwho Michael Ho added a comment -

          https://github.com/apache/incubator-impala/commit/b82eed5ee02eccc21cc69d4af107c0acf31e08fa

          IMPALA-4518: CopyStringVal() doesn't copy null string
          Previously, CopyStringVal() mistakenly copies a null
          StringVal as an empty string (i.e. a non-null string
          with zero length). This change fixes the problem by
          distinguishing between these two cases in CopyStringVal()
          and handles them properly. Also added a test case for it.

          This problem only started showing up recently due to
          commit 51268c0 which
          calls CopyStringVal() in OffsetFnInit(). All other
          pre-existing callers of CopyStringVal() before that
          commit checks if 'src' is null before calling it so
          the problem never showed up. In that sense, this is
          a latent bug exposed by the aforementioned commit.

          Change-Id: I3a5b9349dd08556eba5cfedc8c0063cc59f5be03
          Reviewed-on: http://gerrit.cloudera.org:8080/5198
          Reviewed-by: Michael Ho <kwho@cloudera.com>
          Tested-by: Internal Jenkins

          Show
          kwho Michael Ho added a comment - https://github.com/apache/incubator-impala/commit/b82eed5ee02eccc21cc69d4af107c0acf31e08fa IMPALA-4518 : CopyStringVal() doesn't copy null string Previously, CopyStringVal() mistakenly copies a null StringVal as an empty string (i.e. a non-null string with zero length). This change fixes the problem by distinguishing between these two cases in CopyStringVal() and handles them properly. Also added a test case for it. This problem only started showing up recently due to commit 51268c0 which calls CopyStringVal() in OffsetFnInit(). All other pre-existing callers of CopyStringVal() before that commit checks if 'src' is null before calling it so the problem never showed up. In that sense, this is a latent bug exposed by the aforementioned commit. Change-Id: I3a5b9349dd08556eba5cfedc8c0063cc59f5be03 Reviewed-on: http://gerrit.cloudera.org:8080/5198 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins

            People

            • Assignee:
              kwho Michael Ho
              Reporter:
              mikesbrown Michael Brown
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development