Uploaded image for project: 'Apache AsterixDB'
  1. Apache AsterixDB
  2. ASTERIXDB-1246

Unnecessary decor variables of a group-by are not removed until PushProjectDownRule is fired.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Unnecessary decor variables of a group-by is not removed until PushProjectDownRule is fired.

      Currently, group-by for a subplan is introduced when IntroduceGroupByForSubplanRule is fired. At this time, decor variables for the new group-by operator are also added based on the variable usage after the new group-by operator.

      After this rule, other optimizations might make decor variables unnecessary. One example is that an assign after group-by can be moved before the group-by operator so that a record variable (e.g., $$0) that is required for the given assign does not need to be passed through the group-by operator. These unnecessary decor variables will be removed only when PushProjectDownRule is fired.

      As the rule name suggests, PushProjectDownRule rule will be fired only when we have a project operator in the plan. Currently in my branch (index-only plan branch), this affects the IntroduceSelectAccessMethodRule, which transforms a plan into indexes-utilization plan. In this rule, it checks whether the given plan is an index-only plan by checking variables used after a SELECT operator. If only secondary key and/or primary key are used, then the given plan is an index-only plan and we can use a secodnary-index search to return SK and PK.

      The issue is that IntroduceSelectAccessMethodRule is fired before PushProjectDownRule and generally there is no project is introduced in the plan before IntroduceSelectAccessMethodRule. So, these unnecessary decor variables are not used; however, they still sit in the plan so that the optimizer wrongly decides the given plan as a non-index-only plan. The following is an example query. If we have a secondary index on count1 (PK:tweetid), then this should be qualified as an index-only plan for the outer branch. In fact, it doesn't because of unnecessary decor variables that still sit after some optimizations.

      for $t1 in dataset('TweetMessages')
      where $t1.countA > 0
      return {
      "tweetid1": $t1.tweetid,
      "count1":$t1.countA,
      "t2info": for $t2 in dataset('TweetMessages')
      where $t1.countA /* +indexnl */= $t2.tweetid
      return

      {"tweetid2": $t2.tweetid, "count2": $t2.countB}

      }

      We can separate PushProjectDownRule rule into two rules: push project down and clean decor variables.

        Attachments

          Activity

            People

            • Assignee:
              wangsaeu Taewoo Kim
              Reporter:
              wangsaeu Taewoo Kim
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: