Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1037

Column uniqueness is calculated incorrectly for 'Correlate' expression

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.0
    • Fix Version/s: 1.11.0
    • Component/s: core
    • Labels:
      None

      Description

      Column uniqueness is calculated incorrectly for 'Correlate' expression – and in some cases this leads to java.lang.IndexOutOfBoundsException. Example of such code:

      select
       x.v
      from
       (
        select
         t1.v
        from
         (values (1,1),(1,2)) as t1(k,v) 
         join (values (1)) as t2(k) on t1.k=t2.k
       ) x,
       lateral
       (
        select 
         t.v
        from
         unnest(multiset[x.v]) as t(v)
       ) y
      group by x.v,y.v

      The problems seems to be related to the org.apache.calcite.rel.metadata.RelMdColumnUniqueness.areColumnsUnique(Correlate rel, ImmutableBitSet columns, boolean ignoreNulls) method – it just delegates uniqueness check to left input without changing columns list, which leads to Exception if this list references columns from right input.

      It seems, that right behavior should be following:

      • For Anti/Semi join type keep the current behavior (as resulting rows contains fields only from left input).
      • For Left/Inner join type columns set for correlate is unique only if it includes unique sets from both sides.

        Activity

        Hide
        julianhyde Julian Hyde added a comment -

        Yes, that sounds about right. I think behavior for left/inner correlate is identical to join, so we should be sharing code with RelColumnUniqueness.areColumnsUnique(Join, ImmutableBitSet, boolean) somehow. Correlate used to be a sub-class of Join, so I think this got broken when we separated it.

        Show
        julianhyde Julian Hyde added a comment - Yes, that sounds about right. I think behavior for left/inner correlate is identical to join, so we should be sharing code with RelColumnUniqueness.areColumnsUnique(Join, ImmutableBitSet, boolean) somehow. Correlate used to be a sub-class of Join, so I think this got broken when we separated it.
        Hide
        Lerm Alexey Makhmutov added a comment -

        Well, the first part looks almost identical, however it's unclear how we can handle equijoin case for areColumnsUnique(Join...) – in correlate case columns from the left are just consumed in the right part (in general case).

        I've used following fix to workaround the problem: https://github.com/Lerm/calcite/commit/f40bb3a23d1bc4bdd3d712dc1e8ba8d94733c58c

        Show
        Lerm Alexey Makhmutov added a comment - Well, the first part looks almost identical, however it's unclear how we can handle equijoin case for areColumnsUnique(Join...) – in correlate case columns from the left are just consumed in the right part (in general case). I've used following fix to workaround the problem: https://github.com/Lerm/calcite/commit/f40bb3a23d1bc4bdd3d712dc1e8ba8d94733c58c
        Hide
        julianhyde Julian Hyde added a comment -

        Yes, I suppose you have to treat Correlate the same way as a cartesian join. (Which is the same as a join where the left and right keys have 0 columns.)

        Your work is definitely an improvement over what we have currently. If you add one or two tests to RelMetadataTest and create a pull request I'd be happy to accept it.

        Show
        julianhyde Julian Hyde added a comment - Yes, I suppose you have to treat Correlate the same way as a cartesian join. (Which is the same as a join where the left and right keys have 0 columns.) Your work is definitely an improvement over what we have currently. If you add one or two tests to RelMetadataTest and create a pull request I'd be happy to accept it.
        Hide
        julianhyde Julian Hyde added a comment -
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/1ca15524 . Thanks for the contribution, Alexey Makhmutov !
        Hide
        julianhyde Julian Hyde added a comment -

        Resolved in release 1.11.0 (2017-01-11).

        Show
        julianhyde Julian Hyde added a comment - Resolved in release 1.11.0 (2017-01-11).

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            Lerm Alexey Makhmutov
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development