Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.4.0
    • Fix Version/s: 0.4.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      There may be cases where user defined logic needs to be executed between the compilation and execution phases of a query. One simple example that we ran into is based on archival. If an external utility archives table partitions of example, then a plugin logic can check that for the partitions and take suitable action such as:

      • Unarchiving the partition seemlessly
      • Throwing an error

      etc..

      This same framework can also be used for updating query compilation stats.

      1. patch-463_8.txt
        586 kB
        Ashish Thusoo
      2. patch-463_7.txt
        574 kB
        Ashish Thusoo
      3. patch-463_6.txt
        588 kB
        Ashish Thusoo
      4. patch-463_5.txt
        568 kB
        Ashish Thusoo
      5. patch-463_4.txt
        568 kB
        Ashish Thusoo
      6. patch-463_3.txt
        561 kB
        Ashish Thusoo
      7. patch-463_2.txt
        560 kB
        Ashish Thusoo
      8. patch-463.txt
        550 kB
        Ashish Thusoo

        Activity

        Hide
        Ashish Thusoo added a comment -

        I am planning to add and API on the following lines:

        execute(String queryId, String queryString, List<ql.metadata.Partition> inputPartitions, List<ql.metadata.Partition> outputPartitions, org.apache.hadoop.security.UserGroupInformation ugi)

        Show
        Ashish Thusoo added a comment - I am planning to add and API on the following lines: execute(String queryId, String queryString, List<ql.metadata.Partition> inputPartitions, List<ql.metadata.Partition> outputPartitions, org.apache.hadoop.security.UserGroupInformation ugi)
        Hide
        Ashish Thusoo added a comment -

        Added support for pre-execution hooks.

        The interface is in

        org.apache.hadoop.hive.ql.hooks.PreExecute

        and example test implementation is in

        org.apache.hadoop.hive.ql.hook.PreExecutePrinter in the ql/test directory

        I have changed the data/conf/hive-site.xml to include the following paramerter

        hive.exec.pre.hooks="org.apache.hadoop.hive.ql.hooks.PreExecutePrinter"

        so that this hook is called for all tests. I had to include q/tests in the test.classpath in build-common.xml to achieve this as well.

        As a result all the test outputs have changed with the information emitted by this hook. Just ignore the .out files in the review and do a sanity check later after you are satisfied with the code.

        hive.exec.pre.hooks is a comma separated list of PreExecute hooks all of which get called before execution and after compilation. The inputs and outputs are populated only for queries and DDLs.

        Show
        Ashish Thusoo added a comment - Added support for pre-execution hooks. The interface is in org.apache.hadoop.hive.ql.hooks.PreExecute and example test implementation is in org.apache.hadoop.hive.ql.hook.PreExecutePrinter in the ql/test directory I have changed the data/conf/hive-site.xml to include the following paramerter hive.exec.pre.hooks="org.apache.hadoop.hive.ql.hooks.PreExecutePrinter" so that this hook is called for all tests. I had to include q/tests in the test.classpath in build-common.xml to achieve this as well. As a result all the test outputs have changed with the information emitted by this hook. Just ignore the .out files in the review and do a sanity check later after you are satisfied with the code. hive.exec.pre.hooks is a comma separated list of PreExecute hooks all of which get called before execution and after compilation. The inputs and outputs are populated only for queries and DDLs.
        Hide
        Ashish Thusoo added a comment -

        submitting.

        Show
        Ashish Thusoo added a comment - submitting.
        Hide
        Prasad Chakka added a comment -
        • can you include ql/test/classes through a build variable so that for metastore and cli, this will not get included?
        • add param hive.exec.pre.hooks to the hive-default.xml
        • i think it better for getPreExecuteHooks() to through ClassNotFoundException and put a proper error message in the Driver.execute() otherwise it would be baffling for users what this class and why it is not found
        • in WriteEntity, is there a need to distinguish between local and dfs? As Joydeep pointed out, the file could be on some other file system (such as s3n). I think WriteEntity.d should be a URI or a string representation of URI and let the PreExecute script figure it out what it is.
        • equals() of Read and Write entities should check for NULLs.
        • is there a specific reason for input and output maps be LinkedHashSet?
        • can you include a negative testcase for non-existent preexecutehook class
        Show
        Prasad Chakka added a comment - can you include ql/test/classes through a build variable so that for metastore and cli, this will not get included? add param hive.exec.pre.hooks to the hive-default.xml i think it better for getPreExecuteHooks() to through ClassNotFoundException and put a proper error message in the Driver.execute() otherwise it would be baffling for users what this class and why it is not found in WriteEntity, is there a need to distinguish between local and dfs? As Joydeep pointed out, the file could be on some other file system (such as s3n). I think WriteEntity.d should be a URI or a string representation of URI and let the PreExecute script figure it out what it is. equals() of Read and Write entities should check for NULLs. is there a specific reason for input and output maps be LinkedHashSet? can you include a negative testcase for non-existent preexecutehook class
        Hide
        Prasad Chakka added a comment -

        do you think it would be useful to pass the AST of the query?

        Show
        Prasad Chakka added a comment - do you think it would be useful to pass the AST of the query?
        Hide
        Ashish Thusoo added a comment -

        1. Not sure if we need hive.exec.pre.hooks in hive-default.xml. The HiveConf entry already defines an empty string as a default. Do you have any particular use case in mind which is not covered by that?

        2. Will add the error message and a negative test case
        3. I agree passing the URI is useful, will change this to do that
        4. Will add the null check to WriteEntity and ReadEntity
        5. Set because we want this to not have duplicates. LinkedHashSet to make the order deterministic. Did you have anything else in mind?
        6. Will fix the build stuff.

        Show
        Ashish Thusoo added a comment - 1. Not sure if we need hive.exec.pre.hooks in hive-default.xml. The HiveConf entry already defines an empty string as a default. Do you have any particular use case in mind which is not covered by that? 2. Will add the error message and a negative test case 3. I agree passing the URI is useful, will change this to do that 4. Will add the null check to WriteEntity and ReadEntity 5. Set because we want this to not have duplicates. LinkedHashSet to make the order deterministic. Did you have anything else in mind? 6. Will fix the build stuff.
        Hide
        Ashish Thusoo added a comment -

        This one fixes the error message and add a negative test. It also fixes the null check.

        The next patch will fix the other two issues, but I thought I might upload this one anyway for review.

        Show
        Ashish Thusoo added a comment - This one fixes the error message and add a negative test. It also fixes the null check. The next patch will fix the other two issues, but I thought I might upload this one anyway for review.
        Hide
        Ashish Thusoo added a comment -

        Also the change for dfs and non dfs files is much more involved. We make the dfs and local assumption thoughout the ql layer and we really do not maintain those as URLs. The local and dfs flag is set depending upon whether we are doing

        INSERT OVERWRITE DIRECTORY

        or

        INSERT OVERWRITE LOCAL DIRECTORY.

        We will have to fix that semantics and language constructs before we can address this more holistically. I suggest we file a separate JIRA for that.

        Show
        Ashish Thusoo added a comment - Also the change for dfs and non dfs files is much more involved. We make the dfs and local assumption thoughout the ql layer and we really do not maintain those as URLs. The local and dfs flag is set depending upon whether we are doing INSERT OVERWRITE DIRECTORY or INSERT OVERWRITE LOCAL DIRECTORY. We will have to fix that semantics and language constructs before we can address this more holistically. I suggest we file a separate JIRA for that.
        Hide
        Ashish Thusoo added a comment -

        Fixed build.xml. The class is only needed in ql and hwi tests.

        Show
        Ashish Thusoo added a comment - Fixed build.xml. The class is only needed in ql and hwi tests.
        Hide
        Ashish Thusoo added a comment -

        Fixed a bug with "" string as the hooks string.
        Added a test case for this.
        Also added the description to hive-default.xml as Prasad suggested.

        Show
        Ashish Thusoo added a comment - Fixed a bug with "" string as the hooks string. Added a test case for this. Also added the description to hive-default.xml as Prasad suggested.
        Hide
        Ashish Thusoo added a comment -

        Updated to latest tree.

        Show
        Ashish Thusoo added a comment - Updated to latest tree.
        Hide
        Ashish Thusoo added a comment -

        Fixed the new test failures.

        Show
        Ashish Thusoo added a comment - Fixed the new test failures.
        Hide
        Prasad Chakka added a comment -

        some tests are failing because they don't have the latest output including the query

            [junit] Tests run: 233, Failures: 6, Errors: 0, Time elapsed: 2,112.817 sec
            [junit] Tests run: 47, Failures: 2, Errors: 0, Time elapsed: 80.344 sec
        
        Show
        Prasad Chakka added a comment - some tests are failing because they don't have the latest output including the query [junit] Tests run: 233, Failures: 6, Errors: 0, Time elapsed: 2,112.817 sec [junit] Tests run: 47, Failures: 2, Errors: 0, Time elapsed: 80.344 sec
        Hide
        Ashish Thusoo added a comment -

        Fixed new tests.

        Show
        Ashish Thusoo added a comment - Fixed new tests.
        Hide
        Prasad Chakka added a comment -

        I am getting this error now

        test:
            [junit] Running org.apache.hadoop.hive.jdbc.TestJdbcDriver
            [junit] Hive history file=/data/users/pchakka/workspace/oshive3/jdbc/../build/ql/tmp/hive_job_log_pchakka_200905050933_38182557.txt
            [junit] Hive history file=/data/users/pchakka/workspace/oshive3/jdbc/../build/ql/tmp/hive_job_log_pchakka_200905050933_1469267334.txt
            [junit] Pre Exec Hook Class not found:org.apache.hadoop.hive.ql.hooks.PreExecutePrinter
            [junit] FAILED: Unknown exception : org.apache.hadoop.hive.ql.hooks.PreExecutePrinter
            [junit] Pre Exec Hook Class not found:org.apache.hadoop.hive.ql.hooks.PreExecutePrinter
            [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 4.225 sec
            [junit] FAILED: Unknown exception : org.apache.hadoop.hive.ql.hooks.PreExecutePrinter
            [junit] Test org.apache.hadoop.hive.jdbc.TestJdbcDriver FAILED
        
        Show
        Prasad Chakka added a comment - I am getting this error now test: [junit] Running org.apache.hadoop.hive.jdbc.TestJdbcDriver [junit] Hive history file=/data/users/pchakka/workspace/oshive3/jdbc/../build/ql/tmp/hive_job_log_pchakka_200905050933_38182557.txt [junit] Hive history file=/data/users/pchakka/workspace/oshive3/jdbc/../build/ql/tmp/hive_job_log_pchakka_200905050933_1469267334.txt [junit] Pre Exec Hook Class not found:org.apache.hadoop.hive.ql.hooks.PreExecutePrinter [junit] FAILED: Unknown exception : org.apache.hadoop.hive.ql.hooks.PreExecutePrinter [junit] Pre Exec Hook Class not found:org.apache.hadoop.hive.ql.hooks.PreExecutePrinter [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 4.225 sec [junit] FAILED: Unknown exception : org.apache.hadoop.hive.ql.hooks.PreExecutePrinter [junit] Test org.apache.hadoop.hive.jdbc.TestJdbcDriver FAILED
        Hide
        Ashish Thusoo added a comment -

        Ok. one more time. I hope I have not screwed up this time.. sorry for all these patches..

        Show
        Ashish Thusoo added a comment - Ok. one more time. I hope I have not screwed up this time.. sorry for all these patches..
        Hide
        Prasad Chakka added a comment -

        Committed to Trunk. Thanks Ashish.

        Show
        Prasad Chakka added a comment - Committed to Trunk. Thanks Ashish.

          People

          • Assignee:
            Ashish Thusoo
            Reporter:
            Ashish Thusoo
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development