Hive
  1. Hive
  2. HIVE-186

Refactor code to use a single graph, nodeprocessor, dispatcher and rule abstraction

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.3.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      HIVE-186. Refactor code to use a single graph, nodeprocessor, dispatcher and rule abstraction. (Ashish Thusoo via zshao)
      Show
      HIVE-186 . Refactor code to use a single graph, nodeprocessor, dispatcher and rule abstraction. (Ashish Thusoo via zshao)

      Description

      Currently, the query processor has two different tree and rule abstractions - one for ASTs and one for Operator Graphs. We should clean this up so that we have a single abstraction that can be reused at different stages in the query compiler.

      1. patch-186_2.txt
        226 kB
        Ashish Thusoo
      2. patch-186.txt
        223 kB
        Ashish Thusoo

        Issue Links

          Activity

          Hide
          Ashish Thusoo added a comment -

          This patch contains the cleanup and refactoring of all the graph walking and rules framework. The unified framework is in the package

          org.apache.hadoop.hive.ql.lib

          Node is the interface that must be implemented by the graph in order to use the graph walkers and rule dispatchers available within this framework. There are two implementations of this interface currently -

          1. ASTNode - in ql.parse that is a wrapper around the CommonTree classes of the antlr runtime.
          2. Operator - in ql.exec that implements the operator tree nodes

          I have also removed the DefaultDispatcher implementation of the Dispatcher. This functionality can be equivalently expressed using DefaultRuleDispatcher. Accordingly I have cleaned out the GenMR* processors and the ColumnPruner to reflect these changes. ColumnPruner is also split into ColumnPrunerProcFactory to create the processors for the various rules needed therein and ColumnPrunerProcCtx which is used to carry the context information (this class is an implementation of NodeProcessorCtx) between rules.

          I have gotten rid of all the classes related to the ASTs (ASTEvent, ASTDispatcher, ASTProcessor, ASTEventProcessor etc...)

          The Node interfaces are processed by implementations of NodeProcessor. I have removed the reflection bases invocation that we were doing in the earlier DefaultDispatcher and DefaultRuleDispatcher. Now only a single process function is called and the user has to implement a different processors for different rules (see ColumnPrunerProcFactory).

          The walker interface has been renamed to GraphWalker and the default implementation is now callled DefaultGraphWalker. Also I have eliminated the TopoWalker. DefaultGraphWalker is now not an abstract class so that clients can use it right out of the box. The ColumnPrunerWalker and the GenMapRedWalker are still subclasses of the DefaultGraphWalker.

          Show
          Ashish Thusoo added a comment - This patch contains the cleanup and refactoring of all the graph walking and rules framework. The unified framework is in the package org.apache.hadoop.hive.ql.lib Node is the interface that must be implemented by the graph in order to use the graph walkers and rule dispatchers available within this framework. There are two implementations of this interface currently - 1. ASTNode - in ql.parse that is a wrapper around the CommonTree classes of the antlr runtime. 2. Operator - in ql.exec that implements the operator tree nodes I have also removed the DefaultDispatcher implementation of the Dispatcher. This functionality can be equivalently expressed using DefaultRuleDispatcher. Accordingly I have cleaned out the GenMR* processors and the ColumnPruner to reflect these changes. ColumnPruner is also split into ColumnPrunerProcFactory to create the processors for the various rules needed therein and ColumnPrunerProcCtx which is used to carry the context information (this class is an implementation of NodeProcessorCtx) between rules. I have gotten rid of all the classes related to the ASTs (ASTEvent, ASTDispatcher, ASTProcessor, ASTEventProcessor etc...) The Node interfaces are processed by implementations of NodeProcessor. I have removed the reflection bases invocation that we were doing in the earlier DefaultDispatcher and DefaultRuleDispatcher. Now only a single process function is called and the user has to implement a different processors for different rules (see ColumnPrunerProcFactory). The walker interface has been renamed to GraphWalker and the default implementation is now callled DefaultGraphWalker. Also I have eliminated the TopoWalker. DefaultGraphWalker is now not an abstract class so that clients can use it right out of the box. The ColumnPrunerWalker and the GenMapRedWalker are still subclasses of the DefaultGraphWalker.
          Hide
          Ashish Thusoo added a comment -

          submitting patch.

          Show
          Ashish Thusoo added a comment - submitting patch.
          Hide
          Ashish Thusoo added a comment -

          Made the following changes to accommodate the following review comments:

          1. Remove new String() from ColumnPriner.java
          2. Remove abstract keyword in GraphWalker.java interface
          3. Fix tabs in LineageInfo.hava and ColumnPruner.java
          4. Javadocs for anon blocks in ParseDriver.java
          5. Javadocs for moved functions in ColumnPrunerProcCtx.java
          6. Separator in RuleRegExp.java
          7. Fix operator comments in RuleRegExp.java
          8. Test for null in DefaultRuleDispatcher.java

          As agreed in the review I will be opening JIRAs for cleanups in the old code in DefaultGraphWalker.java and LineageInfo.java.

          Show
          Ashish Thusoo added a comment - Made the following changes to accommodate the following review comments: 1. Remove new String() from ColumnPriner.java 2. Remove abstract keyword in GraphWalker.java interface 3. Fix tabs in LineageInfo.hava and ColumnPruner.java 4. Javadocs for anon blocks in ParseDriver.java 5. Javadocs for moved functions in ColumnPrunerProcCtx.java 6. Separator in RuleRegExp.java 7. Fix operator comments in RuleRegExp.java 8. Test for null in DefaultRuleDispatcher.java As agreed in the review I will be opening JIRAs for cleanups in the old code in DefaultGraphWalker.java and LineageInfo.java.
          Hide
          Namit Jain added a comment -

          +1

          Show
          Namit Jain added a comment - +1

            People

            • Assignee:
              Ashish Thusoo
              Reporter:
              Ashish Thusoo
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development