Hive
  1. Hive
  2. HIVE-2453

Need a way to categorize queries in hooks for improved logging

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We need a way to categorize queries, such as whether or not the include a join clause, a group by clause, etc., in the hooks. This will allow for better performance logging.

      Currently the only way I can find is to go through the operators in the tasks, but which operators are used for the different types of queries may change over time.

      1. HIVE-2453.1.patch.txt
        14 kB
        Kevin Wilfong
      2. HIVE-2453.2.patch.txt
        21 kB
        Kevin Wilfong

        Activity

        Kevin Wilfong created issue -
        Kevin Wilfong made changes -
        Field Original Value New Value
        Attachment HIVE-2453.1.patch.txt [ 12494848 ]
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1933/
        -----------------------------------------------------------

        Review request for hive and Ning Zhang.

        Summary
        -------

        The information that would be useful for categorizing queries is clearest in the Semantic Analyzer, when the data from the Parser is interpreted. I added a new class which is designed to collect that data here, and place it ultimately in the QueryPlan where it will be available to hooks.

        The information I collect is whether or not the query has the following clauses:
        Join
        Group By
        Order By
        Sort By
        Group By after a Join clause

        Also, I store whether or not a script is used for mapping or reducing.

        This addresses bug HIVE-2453.
        https://issues.apache.org/jira/browse/HIVE-2453

        Diffs


        trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 1170719
        trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java PRE-CREATION
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1170719
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1170719
        trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckQueryPropertiesHook.java PRE-CREATION
        trunk/ql/src/test/queries/clientpositive/query_properties.q PRE-CREATION
        trunk/ql/src/test/results/clientpositive/query_properties.q.out PRE-CREATION

        Diff: https://reviews.apache.org/r/1933/diff

        Testing
        -------

        I added a new test, which runs a variety of queries, such that each of the flags in QueryProperties is set by at least one query, and also some are set in combinations.
        I also added a hook which prints the contents of QueryProperties to error on the console.

        I checked the output in the results file and verified it matched what I expected.

        Thanks,

        Kevin

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1933/ ----------------------------------------------------------- Review request for hive and Ning Zhang. Summary ------- The information that would be useful for categorizing queries is clearest in the Semantic Analyzer, when the data from the Parser is interpreted. I added a new class which is designed to collect that data here, and place it ultimately in the QueryPlan where it will be available to hooks. The information I collect is whether or not the query has the following clauses: Join Group By Order By Sort By Group By after a Join clause Also, I store whether or not a script is used for mapping or reducing. This addresses bug HIVE-2453 . https://issues.apache.org/jira/browse/HIVE-2453 Diffs trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 1170719 trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1170719 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1170719 trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckQueryPropertiesHook.java PRE-CREATION trunk/ql/src/test/queries/clientpositive/query_properties.q PRE-CREATION trunk/ql/src/test/results/clientpositive/query_properties.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1933/diff Testing ------- I added a new test, which runs a variety of queries, such that each of the flags in QueryProperties is set by at least one query, and also some are set in combinations. I also added a hook which prints the contents of QueryProperties to error on the console. I checked the output in the results file and verified it matched what I expected. Thanks, Kevin
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1933/#review1946
        -----------------------------------------------------------

        trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java
        <https://reviews.apache.org/r/1933/#comment4427>

        can you split it into 2 parts: useScriptInMapper and useScriptInReducer?

        • Ning

        On 2011-09-16 19:04:32, Kevin Wilfong wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/1933/

        -----------------------------------------------------------

        (Updated 2011-09-16 19:04:32)

        Review request for hive and Ning Zhang.

        Summary

        -------

        The information that would be useful for categorizing queries is clearest in the Semantic Analyzer, when the data from the Parser is interpreted. I added a new class which is designed to collect that data here, and place it ultimately in the QueryPlan where it will be available to hooks.

        The information I collect is whether or not the query has the following clauses:

        Join

        Group By

        Order By

        Sort By

        Group By after a Join clause

        Also, I store whether or not a script is used for mapping or reducing.

        This addresses bug HIVE-2453.

        https://issues.apache.org/jira/browse/HIVE-2453

        Diffs

        -----

        trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 1170719

        trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java PRE-CREATION

        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1170719

        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1170719

        trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckQueryPropertiesHook.java PRE-CREATION

        trunk/ql/src/test/queries/clientpositive/query_properties.q PRE-CREATION

        trunk/ql/src/test/results/clientpositive/query_properties.q.out PRE-CREATION

        Diff: https://reviews.apache.org/r/1933/diff

        Testing

        -------

        I added a new test, which runs a variety of queries, such that each of the flags in QueryProperties is set by at least one query, and also some are set in combinations.

        I also added a hook which prints the contents of QueryProperties to error on the console.

        I checked the output in the results file and verified it matched what I expected.

        Thanks,

        Kevin

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1933/#review1946 ----------------------------------------------------------- trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java < https://reviews.apache.org/r/1933/#comment4427 > can you split it into 2 parts: useScriptInMapper and useScriptInReducer? Ning On 2011-09-16 19:04:32, Kevin Wilfong wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1933/ ----------------------------------------------------------- (Updated 2011-09-16 19:04:32) Review request for hive and Ning Zhang. Summary ------- The information that would be useful for categorizing queries is clearest in the Semantic Analyzer, when the data from the Parser is interpreted. I added a new class which is designed to collect that data here, and place it ultimately in the QueryPlan where it will be available to hooks. The information I collect is whether or not the query has the following clauses: Join Group By Order By Sort By Group By after a Join clause Also, I store whether or not a script is used for mapping or reducing. This addresses bug HIVE-2453 . https://issues.apache.org/jira/browse/HIVE-2453 Diffs ----- trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 1170719 trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1170719 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1170719 trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckQueryPropertiesHook.java PRE-CREATION trunk/ql/src/test/queries/clientpositive/query_properties.q PRE-CREATION trunk/ql/src/test/results/clientpositive/query_properties.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1933/diff Testing ------- I added a new test, which runs a variety of queries, such that each of the flags in QueryProperties is set by at least one query, and also some are set in combinations. I also added a hook which prints the contents of QueryProperties to error on the console. I checked the output in the results file and verified it matched what I expected. Thanks, Kevin
        Hide
        Kevin Wilfong added a comment -

        I'm abandoning this change.

        In order to determine if a script is used in the mapper or reducer, I believe I would need to go through all the Map Reduce tasks' operators looking for Transform operators. That's more work than I would like to perform for a feature which most users probably won't use, so I would add that to a hook. As long as I have to do that work in a hook in order to get that classification, I can get the other categorizations with minimal additional effort.

        Show
        Kevin Wilfong added a comment - I'm abandoning this change. In order to determine if a script is used in the mapper or reducer, I believe I would need to go through all the Map Reduce tasks' operators looking for Transform operators. That's more work than I would like to perform for a feature which most users probably won't use, so I would add that to a hook. As long as I have to do that work in a hook in order to get that classification, I can get the other categorizations with minimal additional effort.
        Kevin Wilfong made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Not A Problem [ 8 ]
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1933/
        -----------------------------------------------------------

        (Updated 2011-09-17 00:14:50.529819)

        Review request for hive and Ning Zhang.

        Summary
        -------

        The information that would be useful for categorizing queries is clearest in the Semantic Analyzer, when the data from the Parser is interpreted. I added a new class which is designed to collect that data here, and place it ultimately in the QueryPlan where it will be available to hooks.

        The information I collect is whether or not the query has the following clauses:
        Join
        Group By
        Order By
        Sort By
        Group By after a Join clause

        Also, I store whether or not a script is used for mapping or reducing.

        This addresses bug HIVE-2453.
        https://issues.apache.org/jira/browse/HIVE-2453

        Diffs


        trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 1170719
        trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java PRE-CREATION
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1170719
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1170719
        trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckQueryPropertiesHook.java PRE-CREATION
        trunk/ql/src/test/queries/clientpositive/query_properties.q PRE-CREATION
        trunk/ql/src/test/results/clientpositive/query_properties.q.out PRE-CREATION

        Diff: https://reviews.apache.org/r/1933/diff

        Testing
        -------

        I added a new test, which runs a variety of queries, such that each of the flags in QueryProperties is set by at least one query, and also some are set in combinations.
        I also added a hook which prints the contents of QueryProperties to error on the console.

        I checked the output in the results file and verified it matched what I expected.

        Thanks,

        Kevin

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1933/ ----------------------------------------------------------- (Updated 2011-09-17 00:14:50.529819) Review request for hive and Ning Zhang. Summary ------- The information that would be useful for categorizing queries is clearest in the Semantic Analyzer, when the data from the Parser is interpreted. I added a new class which is designed to collect that data here, and place it ultimately in the QueryPlan where it will be available to hooks. The information I collect is whether or not the query has the following clauses: Join Group By Order By Sort By Group By after a Join clause Also, I store whether or not a script is used for mapping or reducing. This addresses bug HIVE-2453 . https://issues.apache.org/jira/browse/HIVE-2453 Diffs trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 1170719 trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1170719 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1170719 trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckQueryPropertiesHook.java PRE-CREATION trunk/ql/src/test/queries/clientpositive/query_properties.q PRE-CREATION trunk/ql/src/test/results/clientpositive/query_properties.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1933/diff Testing ------- I added a new test, which runs a variety of queries, such that each of the flags in QueryProperties is set by at least one query, and also some are set in combinations. I also added a hook which prints the contents of QueryProperties to error on the console. I checked the output in the results file and verified it matched what I expected. Thanks, Kevin
        Hide
        Kevin Wilfong added a comment -

        I have changed my mind. After further examination, I have noticed that some of these flags would still not be able to be set accurately without reparsing the query. For example, queries involving sort by and order by can become largely indistinguishable in complicated queries, in particular if the sort by only requires one reducer, from the point of the task/operator tree.

        Show
        Kevin Wilfong added a comment - I have changed my mind. After further examination, I have noticed that some of these flags would still not be able to be set accurately without reparsing the query. For example, queries involving sort by and order by can become largely indistinguishable in complicated queries, in particular if the sort by only requires one reducer, from the point of the task/operator tree.
        Kevin Wilfong made changes -
        Resolution Not A Problem [ 8 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1933/#review1956
        -----------------------------------------------------------

        Ship it!

        • Ning

        On 2011-09-17 00:14:50, Kevin Wilfong wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/1933/

        -----------------------------------------------------------

        (Updated 2011-09-17 00:14:50)

        Review request for hive and Ning Zhang.

        Summary

        -------

        The information that would be useful for categorizing queries is clearest in the Semantic Analyzer, when the data from the Parser is interpreted. I added a new class which is designed to collect that data here, and place it ultimately in the QueryPlan where it will be available to hooks.

        The information I collect is whether or not the query has the following clauses:

        Join

        Group By

        Order By

        Sort By

        Group By after a Join clause

        Also, I store whether or not a script is used for mapping or reducing.

        This addresses bug HIVE-2453.

        https://issues.apache.org/jira/browse/HIVE-2453

        Diffs

        -----

        trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 1170719

        trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java PRE-CREATION

        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1170719

        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1170719

        trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckQueryPropertiesHook.java PRE-CREATION

        trunk/ql/src/test/queries/clientpositive/query_properties.q PRE-CREATION

        trunk/ql/src/test/results/clientpositive/query_properties.q.out PRE-CREATION

        Diff: https://reviews.apache.org/r/1933/diff

        Testing

        -------

        I added a new test, which runs a variety of queries, such that each of the flags in QueryProperties is set by at least one query, and also some are set in combinations.

        I also added a hook which prints the contents of QueryProperties to error on the console.

        I checked the output in the results file and verified it matched what I expected.

        Thanks,

        Kevin

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1933/#review1956 ----------------------------------------------------------- Ship it! Ning On 2011-09-17 00:14:50, Kevin Wilfong wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1933/ ----------------------------------------------------------- (Updated 2011-09-17 00:14:50) Review request for hive and Ning Zhang. Summary ------- The information that would be useful for categorizing queries is clearest in the Semantic Analyzer, when the data from the Parser is interpreted. I added a new class which is designed to collect that data here, and place it ultimately in the QueryPlan where it will be available to hooks. The information I collect is whether or not the query has the following clauses: Join Group By Order By Sort By Group By after a Join clause Also, I store whether or not a script is used for mapping or reducing. This addresses bug HIVE-2453 . https://issues.apache.org/jira/browse/HIVE-2453 Diffs ----- trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 1170719 trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1170719 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1170719 trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckQueryPropertiesHook.java PRE-CREATION trunk/ql/src/test/queries/clientpositive/query_properties.q PRE-CREATION trunk/ql/src/test/results/clientpositive/query_properties.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1933/diff Testing ------- I added a new test, which runs a variety of queries, such that each of the flags in QueryProperties is set by at least one query, and also some are set in combinations. I also added a hook which prints the contents of QueryProperties to error on the console. I checked the output in the results file and verified it matched what I expected. Thanks, Kevin
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2011-09-16 21:27:59, Ning Zhang wrote:

        > trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java, line 42

        > <https://reviews.apache.org/r/1933/diff/1/?file=41497#file41497line42>

        >

        > can you split it into 2 parts: useScriptInMapper and useScriptInReducer?

        Kevin Wilfong wrote:

        Determining whether a script is used in the mapper or the reducer will require going through the operator tree added to each Map Reduce job to determine if a Transform operator is there and then setting the appropriate flag. That is more work than I'd like to do here considering this feature will probably not be used by most users. I would like to keep the flag here, so that it can be decided if that work needs to be performed somewhere else.

        OK. My original thought of splitting this into mapper and reducer flags is that we can analyze the cost of the script operator based on its input size (mappers and reducers have different input size metrics). Let's see if they are needed in the future and file a followup JIRA then.

        • Ning

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1933/#review1946
        -----------------------------------------------------------

        On 2011-09-17 00:14:50, Kevin Wilfong wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/1933/

        -----------------------------------------------------------

        (Updated 2011-09-17 00:14:50)

        Review request for hive and Ning Zhang.

        Summary

        -------

        The information that would be useful for categorizing queries is clearest in the Semantic Analyzer, when the data from the Parser is interpreted. I added a new class which is designed to collect that data here, and place it ultimately in the QueryPlan where it will be available to hooks.

        The information I collect is whether or not the query has the following clauses:

        Join

        Group By

        Order By

        Sort By

        Group By after a Join clause

        Also, I store whether or not a script is used for mapping or reducing.

        This addresses bug HIVE-2453.

        https://issues.apache.org/jira/browse/HIVE-2453

        Diffs

        -----

        trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 1170719

        trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java PRE-CREATION

        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1170719

        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1170719

        trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckQueryPropertiesHook.java PRE-CREATION

        trunk/ql/src/test/queries/clientpositive/query_properties.q PRE-CREATION

        trunk/ql/src/test/results/clientpositive/query_properties.q.out PRE-CREATION

        Diff: https://reviews.apache.org/r/1933/diff

        Testing

        -------

        I added a new test, which runs a variety of queries, such that each of the flags in QueryProperties is set by at least one query, and also some are set in combinations.

        I also added a hook which prints the contents of QueryProperties to error on the console.

        I checked the output in the results file and verified it matched what I expected.

        Thanks,

        Kevin

        Show
        jiraposter@reviews.apache.org added a comment - On 2011-09-16 21:27:59, Ning Zhang wrote: > trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java, line 42 > < https://reviews.apache.org/r/1933/diff/1/?file=41497#file41497line42 > > > can you split it into 2 parts: useScriptInMapper and useScriptInReducer? Kevin Wilfong wrote: Determining whether a script is used in the mapper or the reducer will require going through the operator tree added to each Map Reduce job to determine if a Transform operator is there and then setting the appropriate flag. That is more work than I'd like to do here considering this feature will probably not be used by most users. I would like to keep the flag here, so that it can be decided if that work needs to be performed somewhere else. OK. My original thought of splitting this into mapper and reducer flags is that we can analyze the cost of the script operator based on its input size (mappers and reducers have different input size metrics). Let's see if they are needed in the future and file a followup JIRA then. Ning ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1933/#review1946 ----------------------------------------------------------- On 2011-09-17 00:14:50, Kevin Wilfong wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1933/ ----------------------------------------------------------- (Updated 2011-09-17 00:14:50) Review request for hive and Ning Zhang. Summary ------- The information that would be useful for categorizing queries is clearest in the Semantic Analyzer, when the data from the Parser is interpreted. I added a new class which is designed to collect that data here, and place it ultimately in the QueryPlan where it will be available to hooks. The information I collect is whether or not the query has the following clauses: Join Group By Order By Sort By Group By after a Join clause Also, I store whether or not a script is used for mapping or reducing. This addresses bug HIVE-2453 . https://issues.apache.org/jira/browse/HIVE-2453 Diffs ----- trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 1170719 trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1170719 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1170719 trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckQueryPropertiesHook.java PRE-CREATION trunk/ql/src/test/queries/clientpositive/query_properties.q PRE-CREATION trunk/ql/src/test/results/clientpositive/query_properties.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1933/diff Testing ------- I added a new test, which runs a variety of queries, such that each of the flags in QueryProperties is set by at least one query, and also some are set in combinations. I also added a hook which prints the contents of QueryProperties to error on the console. I checked the output in the results file and verified it matched what I expected. Thanks, Kevin
        Hide
        Ning Zhang added a comment -

        Kevin, I guess we cross posted on the review board and here. As you have noticed that there's really no much difference in the resulting operator tree beween sortby and orderby except the latter requires 1 reducer. However they are different from the syntax point of view. So 2 queries may have different syntaxes but their plan may be the same, or it could also be true that 2 queries's syntax are very similar but there execution plans are different (e.g., CommonJoin can be converted to MapJoin at execution time). So I think for this task we should focus on tag the syntax tree rather than the physical execution plan tree. We probably should examine the operator tree and tag it at one pre-exec hook.

        BTW, we may also need to capture "distribute by", which just distribute the key-values pairs based on keys without sorting at the reducer. This is also one indicator for the analyses that the job need a reduce phase.

        Show
        Ning Zhang added a comment - Kevin, I guess we cross posted on the review board and here. As you have noticed that there's really no much difference in the resulting operator tree beween sortby and orderby except the latter requires 1 reducer. However they are different from the syntax point of view. So 2 queries may have different syntaxes but their plan may be the same, or it could also be true that 2 queries's syntax are very similar but there execution plans are different (e.g., CommonJoin can be converted to MapJoin at execution time). So I think for this task we should focus on tag the syntax tree rather than the physical execution plan tree. We probably should examine the operator tree and tag it at one pre-exec hook. BTW, we may also need to capture "distribute by", which just distribute the key-values pairs based on keys without sorting at the reducer. This is also one indicator for the analyses that the job need a reduce phase.
        Hide
        Kevin Wilfong added a comment -

        Ning, sorry for the confusion, my previous post here was older than the one on the review board. I totally agree with you that we need to examine the syntax tree rather than the execution plan for things like sort by, order by, etc. I still stand by what I said on the review board.

        Show
        Kevin Wilfong added a comment - Ning, sorry for the confusion, my previous post here was older than the one on the review board. I totally agree with you that we need to examine the syntax tree rather than the execution plan for things like sort by, order by, etc. I still stand by what I said on the review board.
        Kevin Wilfong made changes -
        Attachment HIVE-2453.2.patch.txt [ 12495104 ]
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1933/
        -----------------------------------------------------------

        (Updated 2011-09-19 17:09:57.838587)

        Review request for hive and Ning Zhang.

        Changes
        -------

        QueryProperties now captures "distribute by" as Ning requested, and "cluster by" as it seemed like a logical addition.

        I added test cases for these as well.

        Summary
        -------

        The information that would be useful for categorizing queries is clearest in the Semantic Analyzer, when the data from the Parser is interpreted. I added a new class which is designed to collect that data here, and place it ultimately in the QueryPlan where it will be available to hooks.

        The information I collect is whether or not the query has the following clauses:
        Join
        Group By
        Order By
        Sort By
        Group By after a Join clause

        Also, I store whether or not a script is used for mapping or reducing.

        This addresses bug HIVE-2453.
        https://issues.apache.org/jira/browse/HIVE-2453

        Diffs (updated)


        trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 1170719
        trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java PRE-CREATION
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1170719
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1170719
        trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckQueryPropertiesHook.java PRE-CREATION
        trunk/ql/src/test/queries/clientpositive/query_properties.q PRE-CREATION
        trunk/ql/src/test/results/clientpositive/query_properties.q.out PRE-CREATION

        Diff: https://reviews.apache.org/r/1933/diff

        Testing
        -------

        I added a new test, which runs a variety of queries, such that each of the flags in QueryProperties is set by at least one query, and also some are set in combinations.
        I also added a hook which prints the contents of QueryProperties to error on the console.

        I checked the output in the results file and verified it matched what I expected.

        Thanks,

        Kevin

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1933/ ----------------------------------------------------------- (Updated 2011-09-19 17:09:57.838587) Review request for hive and Ning Zhang. Changes ------- QueryProperties now captures "distribute by" as Ning requested, and "cluster by" as it seemed like a logical addition. I added test cases for these as well. Summary ------- The information that would be useful for categorizing queries is clearest in the Semantic Analyzer, when the data from the Parser is interpreted. I added a new class which is designed to collect that data here, and place it ultimately in the QueryPlan where it will be available to hooks. The information I collect is whether or not the query has the following clauses: Join Group By Order By Sort By Group By after a Join clause Also, I store whether or not a script is used for mapping or reducing. This addresses bug HIVE-2453 . https://issues.apache.org/jira/browse/HIVE-2453 Diffs (updated) trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 1170719 trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1170719 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1170719 trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckQueryPropertiesHook.java PRE-CREATION trunk/ql/src/test/queries/clientpositive/query_properties.q PRE-CREATION trunk/ql/src/test/results/clientpositive/query_properties.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1933/diff Testing ------- I added a new test, which runs a variety of queries, such that each of the flags in QueryProperties is set by at least one query, and also some are set in combinations. I also added a hook which prints the contents of QueryProperties to error on the console. I checked the output in the results file and verified it matched what I expected. Thanks, Kevin
        Hide
        He Yongqiang added a comment -

        i haven't look at the change. just have a small question: if a query like "select key, count(1) from (select a.key as key, b.value as value from src a join src b on a.key=b.key) group by key", what tag will this query get?

        Show
        He Yongqiang added a comment - i haven't look at the change. just have a small question: if a query like "select key, count(1) from (select a.key as key, b.value as value from src a join src b on a.key=b.key) group by key", what tag will this query get?
        Hide
        Kevin Wilfong added a comment -

        I have a test case, very similar to this query, in query_properties.q It will get tagged, as join and group by (not join followed by group by).

        Show
        Kevin Wilfong added a comment - I have a test case, very similar to this query, in query_properties.q It will get tagged, as join and group by (not join followed by group by).
        Hide
        He Yongqiang added a comment -

        what i mean is should we tag the hadoop job or the query, or both? for the above example, it has 2 jobs, the first one is a join, and the second a group by.

        Show
        He Yongqiang added a comment - what i mean is should we tag the hadoop job or the query, or both? for the above example, it has 2 jobs, the first one is a join, and the second a group by.
        Hide
        Kevin Wilfong added a comment -

        I see. The code here gives tags at the query level. Query level statistics will be tagged with both tags.

        This is certainly not ideal for statistics at the job level. Tagging jobs will require more thought.

        Show
        Kevin Wilfong added a comment - I see. The code here gives tags at the query level. Query level statistics will be tagged with both tags. This is certainly not ideal for statistics at the job level. Tagging jobs will require more thought.
        Hide
        Ning Zhang added a comment -

        +1. Will commit if tests pass.

        Show
        Ning Zhang added a comment - +1. Will commit if tests pass.
        Hide
        Ning Zhang added a comment -

        Committed. Thanks Kevin!

        Show
        Ning Zhang added a comment - Committed. Thanks Kevin!
        Ning Zhang made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Fix Version/s 0.9.0 [ 12317742 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #967 (See https://builds.apache.org/job/Hive-trunk-h0.21/967/)
        HIVE-2453. Need a way to categorize queries in hooks for improved logging (Kevin Wilfong via Ning Zhang)

        nzhang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1173504
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
        • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckQueryPropertiesHook.java
        • /hive/trunk/ql/src/test/queries/clientpositive/query_properties.q
        • /hive/trunk/ql/src/test/results/clientpositive/query_properties.q.out
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #967 (See https://builds.apache.org/job/Hive-trunk-h0.21/967/ ) HIVE-2453 . Need a way to categorize queries in hooks for improved logging (Kevin Wilfong via Ning Zhang) nzhang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1173504 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckQueryPropertiesHook.java /hive/trunk/ql/src/test/queries/clientpositive/query_properties.q /hive/trunk/ql/src/test/results/clientpositive/query_properties.q.out
        Carl Steinbach made changes -
        Fix Version/s 0.8.0 [ 12316178 ]
        Carl Steinbach made changes -
        Fix Version/s 0.9.0 [ 12317742 ]
        Carl Steinbach made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        4h 51m 1 Kevin Wilfong 16/Sep/11 23:45
        Resolved Resolved Reopened Reopened
        1h 32m 1 Kevin Wilfong 17/Sep/11 01:17
        Reopened Reopened Resolved Resolved
        4d 5h 36m 1 Ning Zhang 21/Sep/11 06:53
        Resolved Resolved Closed Closed
        86d 18h 2m 1 Carl Steinbach 16/Dec/11 23:56

          People

          • Assignee:
            Kevin Wilfong
            Reporter:
            Kevin Wilfong
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development