Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-569

ArrayIndexOutOfBoundsException when deducing collation

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0-incubating
    • Fix Version/s: 1.1.0-incubating
    • Component/s: None
    • Labels:
      None

      Description

      If a subquery has an ORDER BY on a column that is not in the SELECT list and the outer query does another ORDER BY, Calcite encounters an ArrayIndexOutOfBoundException when deducing collation.

      In PlannerTest, I created a simple test by first adding the following traits:

      	    List<RelTraitDef> traitDefs = new ArrayList<RelTraitDef>();
      	    traitDefs.add(ConventionTraitDef.INSTANCE);
      	    traitDefs.add(RelCollationTraitDef.INSTANCE);
      

      And ran the following query:

      select t.psPartkey from (select ps.psPartkey from `tpch`.`partsupp` ps order by ps.psSupplyCost) t order by t.psPartkey"
      
      java.lang.ArrayIndexOutOfBoundsException: -1
      	at org.apache.calcite.rex.RexProgram.deduceCollations(RexProgram.java:589)
      	at org.apache.calcite.rex.RexProgram.getCollations(RexProgram.java:558)
      	at org.apache.calcite.plan.RelOptUtil.createProject(RelOptUtil.java:2685)
      	at org.apache.calcite.plan.RelOptUtil.createProject(RelOptUtil.java:2623)
      	at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectList(SqlToRelConverter.java:3571)
      	at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:613)
      	at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:568)
      	at org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:2929)
      	at org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:526)
      	at org.apache.calcite.prepare.PlannerImpl.convert(PlannerImpl.java:189)
      

        Issue Links

          Activity

          Hide
          julianhyde Julian Hyde added a comment -

          Closing now that 1.1.0-incubating has been released.

          Show
          julianhyde Julian Hyde added a comment - Closing now that 1.1.0-incubating has been released.
          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/2709896e . Test case in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/c0120dd8 .
          Hide
          amansinha100 Aman Sinha added a comment -

          Vladimir Sitnikov do you think your patch will go in soon after the 1.0 release is out ? I am working on a Drill bug which is dependent on this issue.

          Show
          amansinha100 Aman Sinha added a comment - Vladimir Sitnikov do you think your patch will go in soon after the 1.0 release is out ? I am working on a Drill bug which is dependent on this issue.
          Hide
          amansinha100 Aman Sinha added a comment - - edited

          +1 on the last patch - at least it resolves the index-out-of-bound. It's not clear to me why the SortRemoveRule does not remove the extra sort in Sort(y, Project(y, Sort(y) ) ), but I assume that Julian's changes for CALCITE-526 will likely address that issue.

          Show
          amansinha100 Aman Sinha added a comment - - edited +1 on the last patch - at least it resolves the index-out-of-bound. It's not clear to me why the SortRemoveRule does not remove the extra sort in Sort(y, Project(y, Sort(y) ) ) , but I assume that Julian's changes for CALCITE-526 will likely address that issue.
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment -

          Looks like I figured out the simplest yet meaningful patch for current problem: 0001-CALCITE-569-ArrayIndexOutOfBoundsException-from-subq.patch.

          It does work and it looks like to be not very intrusive.

          However this patch does not pass sort(rexover(sort)) kind of plan in https://github.com/apache/incubator-calcite/pull/46/files#diff-fcafae074e093e34d065f7f5a49f5124R393
          I believe it could be a separate issue.

          However the question of Project with non-trivial collation trait is still open.

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - Looks like I figured out the simplest yet meaningful patch for current problem: 0001-CALCITE-569-ArrayIndexOutOfBoundsException-from-subq.patch . It does work and it looks like to be not very intrusive. However this patch does not pass sort(rexover(sort)) kind of plan in https://github.com/apache/incubator-calcite/pull/46/files#diff-fcafae074e093e34d065f7f5a49f5124R393 I believe it could be a separate issue. However the question of Project with non-trivial collation trait is still open.
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment - - edited

          to ban collations on logical RelNodes

          I do not suggest to ban collations on logical RelNodes.
          My point is as follows:
          1) Current deducer of project collations is broken
          2) Thanks to that broken logic it almost never identified a "non-required sort"
          3) Users got correct results

          If we heal collation deducer, we might unexpectedly get wrong data.
          For instance, consider JdbcCalcRule. It blindly adds "select+where" and that is it.
          Does it ensure collation trait via order by? No. It can result in arbitrary shuffle by the database engine.
          The same goes for Enumerable.

          Do you really mean to fix all the Calc/Project/etc converters?
          It would be a non-trivial patch

          That is why I suggest the following:
          1) Heal collation deducer (so method works as expected, without ArrayIndexOut...)
          2) Skip collation deducer when producing logical relations.
          3) We can preserve collation for well-known operations (e.g. EnumerableProject, etc).

          #2 will make Calcite a bit "dumb" (it might preform non-required sorts), however it would result in correct result. I do not like "wrong results" kind of issues from the database engines.

          Later we could revisit the approach and make Calcite smarter in terms of collation preserving and collation non-preserving Projects.

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - - edited to ban collations on logical RelNodes I do not suggest to ban collations on logical RelNodes. My point is as follows: 1) Current deducer of project collations is broken 2) Thanks to that broken logic it almost never identified a "non-required sort" 3) Users got correct results If we heal collation deducer, we might unexpectedly get wrong data . For instance, consider JdbcCalcRule. It blindly adds "select+where" and that is it. Does it ensure collation trait via order by? No. It can result in arbitrary shuffle by the database engine. The same goes for Enumerable. Do you really mean to fix all the Calc/Project/etc converters? It would be a non-trivial patch That is why I suggest the following: 1) Heal collation deducer (so method works as expected, without ArrayIndexOut...) 2) Skip collation deducer when producing logical relations. 3) We can preserve collation for well-known operations (e.g. EnumerableProject, etc). #2 will make Calcite a bit "dumb" (it might preform non-required sorts), however it would result in correct result. I do not like "wrong results" kind of issues from the database engines. Later we could revisit the approach and make Calcite smarter in terms of collation preserving and collation non-preserving Projects.
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment -

          Moved "better names of projects" to CALCITE-573

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - Moved "better names of projects" to CALCITE-573
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment - - edited

          If the input has collations (a, b), (x, p), (x, y, z) and we're looking for (x, y) then we shouldn't stop at (x, p), should we?

          We should not.
          It collects all the collations. For this case it would be (x) and (x,y).
          Then at the usage side it picks collationList.get(0) (the first collation). I did not change that .get(0) part.

          . LogicalProject(x, y) on LogicalSort( y ) will indeed be sorted on y - but any code that creates a RelNode subclass can override.

          Note: it does not matter how you propagate the collation. The result/impact is the same no matter if you use PROPAGATE or you compute the columns (see below).

          That has a flip side:
          1) if you state that LogicalProject has "y" collation, then you basically disallow physical implementations that violate "y" collation.
          For instance Sort(y, Project(y, Sort(y))).
          If Project has collation, then SortRemoveRule transforms the plan to Project(y, Sort(y)). If the implementation picks non-collation-preserving strategy (like ParallelProject(y, Sort(y))) then we are in trouble since noone will reestablish the "required" sort order of the rows.

          Are you sure all the implementations really treat "collation of a project" as a "must"?
          Can you show how EnumerableCalc/Project honors the collation?

          2) We might miss some optimizations. When we propagate the collations we stick with just the subset of plans that keep the collation intact.
          If we allow shuffle-like plans, there is a possibility the plan can be more efficient (e.g. use hash-join instead of merge-join, etc).

          The "uniform" way to solve the problem would be to add both kinds of Projects into the equivalence set (e.g. some sort of rules that replace a collated Project with Sort(NonCollatedProject)).
          It is not very trivial patch, so I suppose to go with the first approach: "kill collations from LogicalProject while still allow to implement it in Physical nodes".

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - - edited If the input has collations (a, b), (x, p), (x, y, z) and we're looking for (x, y) then we shouldn't stop at (x, p), should we? We should not. It collects all the collations. For this case it would be (x) and (x,y). Then at the usage side it picks collationList.get(0) (the first collation). I did not change that .get(0) part. . LogicalProject(x, y) on LogicalSort( y ) will indeed be sorted on y - but any code that creates a RelNode subclass can override. Note: it does not matter how you propagate the collation. The result/impact is the same no matter if you use PROPAGATE or you compute the columns (see below). That has a flip side: 1) if you state that LogicalProject has "y" collation, then you basically disallow physical implementations that violate "y" collation. For instance Sort(y, Project(y, Sort(y))) . If Project has collation, then SortRemoveRule transforms the plan to Project(y, Sort(y)). If the implementation picks non-collation-preserving strategy (like ParallelProject(y, Sort(y))) then we are in trouble since noone will reestablish the "required" sort order of the rows. Are you sure all the implementations really treat "collation of a project" as a "must"? Can you show how EnumerableCalc/Project honors the collation? 2) We might miss some optimizations. When we propagate the collations we stick with just the subset of plans that keep the collation intact. If we allow shuffle-like plans, there is a possibility the plan can be more efficient (e.g. use hash-join instead of merge-join, etc). The "uniform" way to solve the problem would be to add both kinds of Projects into the equivalence set (e.g. some sort of rules that replace a collated Project with Sort(NonCollatedProject)). It is not very trivial patch, so I suppose to go with the first approach: "kill collations from LogicalProject while still allow to implement it in Physical nodes".
          Hide
          julianhyde Julian Hyde added a comment - - edited

          Your code change to RexProgram doesn't look quite right. If the input has collations (a, b), (x, p), (x, y, z) and we're looking for (x, y) then we shouldn't stop at (x, p), should we?

          There was a specific reason that I introduced PRESERVE (ordering by non-projected columns). I couldn't tell from the patch – is that case still handled?

          I am working on CALCITE-526 in branch https://github.com/julianhyde/incubator-calcite/tree/calcite-526. A lot of the work is to do with traits and collation; for instance, I allow a RelNode (but not a RelSubset) to have multiple traits of the same type, if the traitDef supports it. I think we need to solve both issues at the same time.

          So I propose that we split the patch in two – the column renames can be checked in first, and I'll fold the collation work into my branch. What do you think?

          I'm uncomfortable with the statement "LogicalProject is always created with empty collation". The LogicalXxx nodes are of logical convention but I'm not sure we should disallow them from having other traits. I'm going to put the logic to deduce the collations for core types (project, filter, sort, aggregate, join, union) into a new class RelMdCollation. By default, each RelNode subclass will have the traits you'd expect - e.g. LogicalProject(x, y) on LogicalSort( y ) will indeed be sorted on y - but any code that creates a RelNode subclass can override.

          I am well aware that not every implementation of Aggregate(x, y, sum(z)) produces output sorted on (x, y) but to ban collations on logical RelNodes would be going to far the other direction. I don't want to have to wait til we get into the physical domain (e.g. EnumerableXxx) before collation comes into play. That will make it more difficult to share rules.

          Show
          julianhyde Julian Hyde added a comment - - edited Your code change to RexProgram doesn't look quite right. If the input has collations (a, b), (x, p), (x, y, z) and we're looking for (x, y) then we shouldn't stop at (x, p), should we? There was a specific reason that I introduced PRESERVE (ordering by non-projected columns). I couldn't tell from the patch – is that case still handled? I am working on CALCITE-526 in branch https://github.com/julianhyde/incubator-calcite/tree/calcite-526 . A lot of the work is to do with traits and collation; for instance, I allow a RelNode (but not a RelSubset) to have multiple traits of the same type, if the traitDef supports it. I think we need to solve both issues at the same time. So I propose that we split the patch in two – the column renames can be checked in first, and I'll fold the collation work into my branch. What do you think? I'm uncomfortable with the statement "LogicalProject is always created with empty collation". The LogicalXxx nodes are of logical convention but I'm not sure we should disallow them from having other traits. I'm going to put the logic to deduce the collations for core types (project, filter, sort, aggregate, join, union) into a new class RelMdCollation. By default, each RelNode subclass will have the traits you'd expect - e.g. LogicalProject(x, y) on LogicalSort( y ) will indeed be sorted on y - but any code that creates a RelNode subclass can override. I am well aware that not every implementation of Aggregate(x, y, sum(z)) produces output sorted on (x, y) but to ban collations on logical RelNodes would be going to far the other direction. I don't want to have to wait til we get into the physical domain (e.g. EnumerableXxx) before collation comes into play. That will make it more difficult to share rules.
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment -

          Please find the updated patch:
          1) LogicalProject is always created with empty collation
          2) EnumerableProject keeps track of the collation
          3) createProject keeps more column names (lots of tests updated with just column renames)

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - Please find the updated patch: 1) LogicalProject is always created with empty collation 2) EnumerableProject keeps track of the collation 3) createProject keeps more column names (lots of tests updated with just column renames)
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment -

          Let's wait Julian for a while, then decide what to do with the patch.

          I'm inclined to removing final List<RelCollation> collationList = program.getCollations(child.getCollationList()); from org.apache.calcite.plan.RelOptUtil#createProject(org.apache.calcite.rel.RelNode, java.util.List<? extends org.apache.calcite.rex.RexNode>, java.util.List<java.lang.String>, boolean) to make sure Projects always come without a Collation

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - Let's wait Julian for a while, then decide what to do with the patch. I'm inclined to removing final List<RelCollation> collationList = program.getCollations(child.getCollationList()); from org.apache.calcite.plan.RelOptUtil#createProject(org.apache.calcite.rel.RelNode, java.util.List<? extends org.apache.calcite.rex.RexNode>, java.util.List<java.lang.String>, boolean) to make sure Projects always come without a Collation
          Hide
          amansinha100 Aman Sinha added a comment -

          I propose that we open a separate JIRA to correctly track collation trait for Project. The IOBE issue can be resolved using the current patch.

          Show
          amansinha100 Aman Sinha added a comment - I propose that we open a separate JIRA to correctly track collation trait for Project. The IOBE issue can be resolved using the current patch.
          Hide
          jni Jinfeng Ni added a comment -

          "1) RelOptUtil.create(RelNode child, List<Integer> cols) ignores field names. Is it intentional?
          If I add the names, then I get somewhat more readable plans (proper column names instead of $f0)
          Should we keep original names?"

          Second on this one. It's better to keep the original names, if there is no conflicts at all. Those internal fields $f0, $f1, .. always makes it hard to debug the internal plan representation.

          Show
          jni Jinfeng Ni added a comment - "1) RelOptUtil.create(RelNode child, List<Integer> cols) ignores field names. Is it intentional? If I add the names, then I get somewhat more readable plans (proper column names instead of $f0) Should we keep original names?" Second on this one. It's better to keep the original names, if there is no conflicts at all. Those internal fields $f0, $f1, .. always makes it hard to debug the internal plan representation.
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment -

          In fact, a collation trait on a Project is tricky.
          I am not sure the user of Project is supposed to track collation trait and keep the sortedness all the way up to the execution root.

          Certain projections are just hard to implement with keeping the sort order (e.g. RexOver window functions).

          So, I believe we should know (or invent) how it supposed to work.
          If we allow LogicalProject to shuffle the rows, then we should always nullify collation trait of a project, and disable deduceCollation call from createProject.

          If LogicalProject is supposed to honor collation trait, then we should reset the collation trait at leadt for projects with window functions.

          Julian Hyde, we need your input.

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - In fact, a collation trait on a Project is tricky. I am not sure the user of Project is supposed to track collation trait and keep the sortedness all the way up to the execution root. Certain projections are just hard to implement with keeping the sort order (e.g. RexOver window functions). So, I believe we should know (or invent) how it supposed to work. If we allow LogicalProject to shuffle the rows, then we should always nullify collation trait of a project, and disable deduceCollation call from createProject. If LogicalProject is supposed to honor collation trait, then we should reset the collation trait at leadt for projects with window functions. Julian Hyde , we need your input.
          Hide
          amansinha100 Aman Sinha added a comment -

          Added expected plan for calcite-569.

          Show
          amansinha100 Aman Sinha added a comment - Added expected plan for calcite-569.
          Hide
          amansinha100 Aman Sinha added a comment -

          The patch looks ok to me based on the following observations:

          • Since RelOptUtil.createProject() creates a LogicalProject with empty collation trait (not preserve), it is consistent with my previous observation that the IOBE was due to preserve.
          • The refactoring to use the createProject() makes sense since it takes into account trivial projects.

          Even with these changes, the unnecessary Sort appears at the root of the plan - can you check if SortRemoveRule is getting applied or not ?
          I have updated the plan test to have the expected plan that contains exactly 1 Sort and will upload the diff file. This test will fail currently until the unnecessary Sort issue is resolved.

          Show
          amansinha100 Aman Sinha added a comment - The patch looks ok to me based on the following observations: Since RelOptUtil.createProject() creates a LogicalProject with empty collation trait (not preserve), it is consistent with my previous observation that the IOBE was due to preserve. The refactoring to use the createProject() makes sense since it takes into account trivial projects. Even with these changes, the unnecessary Sort appears at the root of the plan - can you check if SortRemoveRule is getting applied or not ? I have updated the plan test to have the expected plan that contains exactly 1 Sort and will upload the diff file. This test will fail currently until the unnecessary Sort issue is resolved.
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment -

          I've scanned through the createProject and it looks like a little needs to be done to support proper collation traits.

          In fact, RelOptUtil.createProject almost had that feature, however it was not properly coded.

          Aman Sinha,
          Can you please check 0001-Properly-track-collation-trait-for-select-a-from-.-o.patch?

          Can you please add more specific assertion to the test? I am not sure if assertThat(plan, containsString("Sort")); tells much if the assert fails.
          Well, it is probably obvious to you, however it will become puzzling if the test breaks.

          Julian Hyde,
          1) RelOptUtil.create(RelNode child, List<Integer> cols) ignores field names. Is it intentional?
          If I add the names, then I get somewhat more readable plans (proper column names instead of $f0)

          Should we keep original names?

          2) What rule should remove "unnecessary Sort"? Basically with my patch I see (Sort(Project(..., collation=[0]), collation=[0]).

          3) As far as I understand, new LogicalProject should not be used as it almost always misses collation trait.

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - I've scanned through the createProject and it looks like a little needs to be done to support proper collation traits. In fact, RelOptUtil.createProject almost had that feature, however it was not properly coded. Aman Sinha , Can you please check 0001-Properly-track-collation-trait-for-select-a-from-.-o.patch ? Can you please add more specific assertion to the test? I am not sure if assertThat(plan, containsString("Sort")); tells much if the assert fails. Well, it is probably obvious to you, however it will become puzzling if the test breaks. Julian Hyde , 1) RelOptUtil.create(RelNode child, List<Integer> cols) ignores field names. Is it intentional? If I add the names, then I get somewhat more readable plans (proper column names instead of $f0 ) Should we keep original names? 2) What rule should remove "unnecessary Sort"? Basically with my patch I see (Sort(Project(..., collation=[0]), collation=[0]) . 3) As far as I understand, new LogicalProject should not be used as it almost always misses collation trait.
          Hide
          amansinha100 Aman Sinha added a comment -

          Hmm..for your query, with my change, I get the following plan ... which has an extra Sort at the top.

          select t.psPartkey from (select ps.psPartkey from `tpch`.`partsupp` ps order by ps.psPartkey, ps.psSupplyCost) t order by t.psPartkey"
          

          Plan:

          Sort(sort0=[$0], dir0=[ASC])
            LogicalProject(psPartkey=[$0])
              LogicalProject(psPartkey=[$0])
                Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
                  LogicalProject(psPartkey=[$0], psSupplyCost=[$1])
                    EnumerableTableScan(table=[[tpch, partsupp]])
          

          However, with the original code, this query fails with the same IOBE, so in that sense the patch is an improvement but not optimal plan yet. For the better plan I think we have to look closer at trait propagation.

          If you are ok with this, I will add your test.

          Show
          amansinha100 Aman Sinha added a comment - Hmm..for your query, with my change, I get the following plan ... which has an extra Sort at the top. select t.psPartkey from (select ps.psPartkey from `tpch`.`partsupp` ps order by ps.psPartkey, ps.psSupplyCost) t order by t.psPartkey" Plan: Sort(sort0=[$0], dir0=[ASC]) LogicalProject(psPartkey=[$0]) LogicalProject(psPartkey=[$0]) Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) LogicalProject(psPartkey=[$0], psSupplyCost=[$1]) EnumerableTableScan(table=[[tpch, partsupp]]) However, with the original code, this query fails with the same IOBE, so in that sense the patch is an improvement but not optimal plan yet. For the better plan I think we have to look closer at trait propagation. If you are ok with this, I will add your test.
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment -

          What will be for select a from ... order by a,b?
          Will collation hold for the first column?
          Can you please add such test?

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - What will be for select a from ... order by a,b ? Will collation hold for the first column? Can you please add such test?
          Hide
          amansinha100 Aman Sinha added a comment -

          Uploaded a patch and a unit test case. The motivation for the fix is that if a Project (or any other plan node) has an input that is sorted by an expression that is not part of the list of expressions being projected by that node, then its collation trait should be empty. Further, it looks like converting to PRESERVE collation is not allowed (in canConvert() function of RelCollationTraitDef), so creating a Project with PRESERVE can cause problems.
          Pls review the patch. Thanks.

          Show
          amansinha100 Aman Sinha added a comment - Uploaded a patch and a unit test case. The motivation for the fix is that if a Project (or any other plan node) has an input that is sorted by an expression that is not part of the list of expressions being projected by that node, then its collation trait should be empty. Further, it looks like converting to PRESERVE collation is not allowed (in canConvert() function of RelCollationTraitDef), so creating a Project with PRESERVE can cause problems. Pls review the patch. Thanks.
          Hide
          amansinha100 Aman Sinha added a comment -

          I investigated a potential fix for this and it looks like the convertOrder() function creates a Project using the PRESERVE collation. My understanding is since we are still in the early stage of sql-to-rel conversion it does not seem necessary to do this. Instead, changing this to RelCollationImpl.EMPTY resolves the issue. I will do some more testing but let me know if there's a better alternative.

          Show
          amansinha100 Aman Sinha added a comment - I investigated a potential fix for this and it looks like the convertOrder() function creates a Project using the PRESERVE collation. My understanding is since we are still in the early stage of sql-to-rel conversion it does not seem necessary to do this. Instead, changing this to RelCollationImpl.EMPTY resolves the issue. I will do some more testing but let me know if there's a better alternative.

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              amansinha100 Aman Sinha
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development