Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      From mailing list:
      --Amazon Elastic MapReduce version of Hive seems to have a nice feature called "Variables." Basically you can define a variable via command-line while invoking hive with -d DT=2009-12-09 and then refer to the variable via $

      {DT}

      within the hive queries. This could be extremely useful. I can't seem to find this feature even on trunk. Is this feature currently anywhere in the roadmap?--

      This could be implemented in many places.
      A simple place to put this is
      in Driver.compile or Driver.run we can do string substitutions at that level, and further downstream need not be effected.

      There could be some benefits to doing this further downstream, parser,plan. but based on the simple needs we may not need to overthink this.

      I will get started on implementing in compile unless someone wants to discuss this more.

      1. 1096-9.diff
        10 kB
        Edward Capriolo
      2. hive-1096.diff
        4 kB
        Edward Capriolo
      3. hive-1096-10-patch.txt
        12 kB
        Edward Capriolo
      4. hive-1096-11-patch.txt
        12 kB
        Edward Capriolo
      5. hive-1096-12.patch.txt
        15 kB
        Edward Capriolo
      6. hive-1096-15.patch.txt
        22 kB
        Edward Capriolo
      7. hive-1096-15.patch.txt
        17 kB
        Edward Capriolo
      8. hive-1096-2.diff
        5 kB
        Edward Capriolo
      9. hive-1096-20.patch.txt
        22 kB
        Edward Capriolo
      10. hive-1096-7.diff
        6 kB
        Edward Capriolo
      11. hive-1096-8.diff
        10 kB
        Edward Capriolo

        Issue Links

          Activity

          Edward Capriolo created issue -
          Hide
          John Sichi added a comment -

          What will happen if a variable is referenced by CREATE VIEW? If the substitution is done via simple string replacement, then the variable's value at time of CREATE VIEW will be statically bound. If done downstream, we could instead make it dynamic, but this might require making variables first-class objects in the metastore (as globals, potentially with overrides at other scopes such as per-session or per-user).

          Show
          John Sichi added a comment - What will happen if a variable is referenced by CREATE VIEW? If the substitution is done via simple string replacement, then the variable's value at time of CREATE VIEW will be statically bound. If done downstream, we could instead make it dynamic, but this might require making variables first-class objects in the metastore (as globals, potentially with overrides at other scopes such as per-session or per-user).
          Hide
          Edward Capriolo added a comment -

          John,

          I understand your comment.

          I have not looked at any of the view related patches, but I do see how views and variable replacement could pose a complex set of discussions, as would introducing variables into the grammer itself.

          Based on the user requested feature, metastore level changes is a fairly drastic and complex way to solve a simple problem. If i were a gambling man, I would bet that the 'Elastic MapReduce version of Hive' does a simple text based replacement high upstream or possibly outside of hive entirely.

          Thus, I proposed some simple string replacement before the query ever hits the compiler. In the implementation I am suggesting a VIEW could never have a variable, because it should be replaced before it ever becomes a view.

          I see people using this feature like so:

          hive -hiveconf DAY=5 -e "LOAD DATA INFILE '/tmp/${DAY}'  into logs partition=${DAY}"
          

          This is a common problem that many people (including myself) face. Do you think it is short-side thinking to not consider implementing further down in the process?

          Show
          Edward Capriolo added a comment - John, I understand your comment. I have not looked at any of the view related patches, but I do see how views and variable replacement could pose a complex set of discussions, as would introducing variables into the grammer itself. Based on the user requested feature, metastore level changes is a fairly drastic and complex way to solve a simple problem. If i were a gambling man, I would bet that the 'Elastic MapReduce version of Hive' does a simple text based replacement high upstream or possibly outside of hive entirely. Thus, I proposed some simple string replacement before the query ever hits the compiler. In the implementation I am suggesting a VIEW could never have a variable, because it should be replaced before it ever becomes a view. I see people using this feature like so: hive -hiveconf DAY=5 -e "LOAD DATA INFILE '/tmp/${DAY}' into logs partition=${DAY}" This is a common problem that many people (including myself) face. Do you think it is short-side thinking to not consider implementing further down in the process?
          Hide
          John Sichi added a comment -

          I'm fine with it if we start with the implementation you are proposing, document the limitation with respect to views, and decide if more is needed based on user feedback.

          I'm guessing that once people start using the two features together, they might end up wanting the variables to be expanded as part of evaluating a view, otherwise they may be forced to move ETL logic which belongs inside a reusable view out into the calling SQL statements instead where the expansion happens. This would limit the utility of views.

          I agree that treating variables as first-class objects is a heavyweight change.

          There is one possible middle ground approach, which is to avoid introducing variables as first-class objects, but still try to expand them in views by passing the substitution map down into SemanticAnalyzer and letting it apply the substitutions as part of reparsing the view's definition obtained from the catalog.

          This would imply

          (a) CREATE VIEW would need to be able to undo the substitutions so that the stored view definition in the catalog would contain variable references instead of replacements.

          (b) If a variable was defined when the view was created, but undefined when referenced later, we'd need to substitute NULL or raise an exception. Probably best to be consistent with whatever you're planning for the top-level substitutions in this case. I made the view reparse error handling verbose so that a user should have a chance of figuring out what happened in this case, although it will be about as pretty as a C++ template compilation failure.

          Show
          John Sichi added a comment - I'm fine with it if we start with the implementation you are proposing, document the limitation with respect to views, and decide if more is needed based on user feedback. I'm guessing that once people start using the two features together, they might end up wanting the variables to be expanded as part of evaluating a view, otherwise they may be forced to move ETL logic which belongs inside a reusable view out into the calling SQL statements instead where the expansion happens. This would limit the utility of views. I agree that treating variables as first-class objects is a heavyweight change. There is one possible middle ground approach, which is to avoid introducing variables as first-class objects, but still try to expand them in views by passing the substitution map down into SemanticAnalyzer and letting it apply the substitutions as part of reparsing the view's definition obtained from the catalog. This would imply (a) CREATE VIEW would need to be able to undo the substitutions so that the stored view definition in the catalog would contain variable references instead of replacements. (b) If a variable was defined when the view was created, but undefined when referenced later, we'd need to substitute NULL or raise an exception. Probably best to be consistent with whatever you're planning for the top-level substitutions in this case. I made the view reparse error handling verbose so that a user should have a chance of figuring out what happened in this case, although it will be about as pretty as a C++ template compilation failure.
          Hide
          Prasad Chakka added a comment -

          I think most useful global variables are in-built variables that do not really need metastore (eg. date based stuff, TODAY, THIS_WEEK, THIS_MONTH or user name or host name etc). Also session variables and user specific variables that can be loaded with a .hiverc. So it may be possible to implement most of the requirements without true global variables (and metastore).

          String substitution will create problems with comments and also with error reporting.

          Show
          Prasad Chakka added a comment - I think most useful global variables are in-built variables that do not really need metastore (eg. date based stuff, TODAY, THIS_WEEK, THIS_MONTH or user name or host name etc). Also session variables and user specific variables that can be loaded with a .hiverc. So it may be possible to implement most of the requirements without true global variables (and metastore). String substitution will create problems with comments and also with error reporting.
          Hide
          Edward Capriolo added a comment -

          @Prasad,

          I do not think we should do anything fancy here, just look for simple $

          {var}

          and replace with var from session state. It could happen that ${ } could appear in a comment, but not very likely.

          >>String substitution will create problems with comments and also with error reporting.
          Based on that we should make variables part of the grammar then?

          In my example...

          hive -hiveconf DAY=5 -e "LOAD DATA INFILE '/tmp/${DAY}'  into logs partition=${DAY}"
          

          Can we make the grammar look inside a string like '/tmp/$

          {DAY}

          ' ?

          Show
          Edward Capriolo added a comment - @Prasad, I do not think we should do anything fancy here, just look for simple $ {var} and replace with var from session state. It could happen that ${ } could appear in a comment, but not very likely. >>String substitution will create problems with comments and also with error reporting. Based on that we should make variables part of the grammar then? In my example... hive -hiveconf DAY=5 -e "LOAD DATA INFILE '/tmp/${DAY}' into logs partition=${DAY}" Can we make the grammar look inside a string like '/tmp/$ {DAY} ' ?
          Hide
          Prasad Chakka added a comment -

          Hive will be doing string replace on all string literals returned from parser.

          Show
          Prasad Chakka added a comment - Hive will be doing string replace on all string literals returned from parser.
          Hide
          Edward Capriolo added a comment -

          That sounds like a good way to handle it.
          Two questions
          *) What to support
          *) How to escape

          Suggestions:
          *) Simple replacements $

          {VAR}

          syntax ONLY no $VAR no $

          {VAR[3]}

          *) \$ ignore

          ?

          Show
          Edward Capriolo added a comment - That sounds like a good way to handle it. Two questions *) What to support *) How to escape Suggestions: *) Simple replacements $ {VAR} syntax ONLY no $VAR no $ {VAR[3]} *) \$ ignore ?
          Hide
          Carl Steinbach added a comment -

          Wanted to make sure that everyone is aware of HIVE-1063 which aims to
          implement simple variable interpolation for test qfiles. The patch I provided
          adds a system property variable interpolation mechanism to QTestUtil using
          some code I borrowed from Hadoop's Configuration class.

          The bug I mention in the JIRA relates to some details of how the test diffs
          are compared, and not to the interpolation mechanism itself which I think works
          OK.

          Show
          Carl Steinbach added a comment - Wanted to make sure that everyone is aware of HIVE-1063 which aims to implement simple variable interpolation for test qfiles. The patch I provided adds a system property variable interpolation mechanism to QTestUtil using some code I borrowed from Hadoop's Configuration class. The bug I mention in the JIRA relates to some details of how the test diffs are compared, and not to the interpolation mechanism itself which I think works OK.
          Hide
          Edward Capriolo added a comment -

          @Carl,

          I took a look at that. I imagine we needed this to deal with the version'ed jar names. I will look at how you did the interpolation, and try to follow your example.

          I see how what you did is useful in the build infrastructure. This is geared at end user but the logic is roughly the same.

          Show
          Edward Capriolo added a comment - @Carl, I took a look at that. I imagine we needed this to deal with the version'ed jar names. I will look at how you did the interpolation, and try to follow your example. I see how what you did is useful in the build infrastructure. This is geared at end user but the logic is roughly the same.
          Hide
          Carl Steinbach added a comment -

          I think a simple query preprocessor that does string substitution (and is physically distinct
          from the actual query processor) will satisfy the vast majority of user needs, will be easy
          to implement, and won't result in a lot of unexpected bugs and corner cases. Doing anything
          fancier should be deferred until we start work on implementing PL/QL

          Show
          Carl Steinbach added a comment - I think a simple query preprocessor that does string substitution (and is physically distinct from the actual query processor) will satisfy the vast majority of user needs, will be easy to implement, and won't result in a lot of unexpected bugs and corner cases. Doing anything fancier should be deferred until we start work on implementing PL/QL
          Hide
          John Sichi added a comment -

          Regarding error reporting (where the line/col offsets may be shifted by the substitutions), one possible solution (which I implemented for view reparse) is to include the processed text along with the error message. Downside is that the error message length becomes proportional to the SQL.

          Another way to do it is something like this.
          -------------------------------^

          and then only include the line with the error.

          Show
          John Sichi added a comment - Regarding error reporting (where the line/col offsets may be shifted by the substitutions), one possible solution (which I implemented for view reparse) is to include the processed text along with the error message. Downside is that the error message length becomes proportional to the SQL. Another way to do it is something like this. -------------------------------^ and then only include the line with the error.
          Hide
          Namit Jain added a comment -

          Static/dynamic binding will also come in when we allow resuming execution of a query - what happens if the variable value changes ?
          Are we saying that everything is bound when the query is first compiled (which is true for views also) ?

          https://issues.apache.org/jira/browse/HIVE-1100

          Show
          Namit Jain added a comment - Static/dynamic binding will also come in when we allow resuming execution of a query - what happens if the variable value changes ? Are we saying that everything is bound when the query is first compiled (which is true for views also) ? https://issues.apache.org/jira/browse/HIVE-1100
          Hide
          Edward Capriolo added a comment -

          @namit,

          >>Static/dynamic binding will also come in when we allow resuming execution of a query-

          HIVE-1100 sounds pretty complex. How can you move a query to a new session, when the session effects how the query is executed? For example, if i start a query in a session with without strict mode then move it to a session with strict mode is the query even valid anymore?

          This leads me back with my original thinking to do the substitution in Driver.run. 1) it's fast 2) it's easy 3)a caveman (like me) can do it. Really, the feature could go all the way up to the CLI, HiveService/clients really would not have a great use for this.

          >>'Hive will be doing string replace on all string literals returned from parser'

          sounds good too, but not if it will change the debugging offsets.

          I am sure some crafty sed hacker could cook up out a one-liner for this, which is maybe fine we punt it and do not support it, might be better then totally obfuscating the code base for it.

          Show
          Edward Capriolo added a comment - @namit, >>Static/dynamic binding will also come in when we allow resuming execution of a query- HIVE-1100 sounds pretty complex. How can you move a query to a new session, when the session effects how the query is executed? For example, if i start a query in a session with without strict mode then move it to a session with strict mode is the query even valid anymore? This leads me back with my original thinking to do the substitution in Driver.run. 1) it's fast 2) it's easy 3)a caveman (like me) can do it. Really, the feature could go all the way up to the CLI, HiveService/clients really would not have a great use for this. >>'Hive will be doing string replace on all string literals returned from parser' sounds good too, but not if it will change the debugging offsets. I am sure some crafty sed hacker could cook up out a one-liner for this, which is maybe fine we punt it and do not support it, might be better then totally obfuscating the code base for it.
          Hide
          Edward Capriolo added a comment -

          This is a patch using my simple replace before plan generation technique. I will look into the other suggestions next

          Show
          Edward Capriolo added a comment - This is a patch using my simple replace before plan generation technique. I will look into the other suggestions next
          Edward Capriolo made changes -
          Field Original Value New Value
          Attachment hive-1096.diff [ 12431498 ]
          Hide
          Edward Capriolo added a comment -

          @Prasad
          I would like to give a try at your suggestion.
          >>Hive will be doing string replace on all string literals returned from parser.

          Can you point me at a class and or method as a starting point. The parsing in QL is a fairly elaborate process I do not fully have my head wrapped around yet. If you have a starting point that would help me greatly.

          Show
          Edward Capriolo added a comment - @Prasad I would like to give a try at your suggestion. >>Hive will be doing string replace on all string literals returned from parser. Can you point me at a class and or method as a starting point. The parsing in QL is a fairly elaborate process I do not fully have my head wrapped around yet. If you have a starting point that would help me greatly.
          Hide
          Prasad Chakka added a comment -

          @edward, your patch will have problems if a comment has $ and it is not a real variable. hive will be throwing necessary errors.

          regarding the parsing code, afaik there is no single place where all string literals are processed. either ashish and zheng would be able to answer this better.

          Show
          Prasad Chakka added a comment - @edward, your patch will have problems if a comment has $ and it is not a real variable. hive will be throwing necessary errors. regarding the parsing code, afaik there is no single place where all string literals are processed. either ashish and zheng would be able to answer this better.
          Hide
          Zheng Shao added a comment -

          Regarding https://issues.apache.org/jira/browse/HIVE-1100, the plan is to bind everything at compilation time.

          Currently the code path for "set", "add <jar|file|archive>" and hive queries are completely separated. We probably want to support this in all 3 cases.
          Unless we extract the common code out, we will have to replicate the variable substitution code, which is even worse.

          +1 on the patch given that we don't have a single place to extract the strings. We probably need to modify the syntax of "add jar" etc to use quotes for string constants.

          Show
          Zheng Shao added a comment - Regarding https://issues.apache.org/jira/browse/HIVE-1100 , the plan is to bind everything at compilation time. Currently the code path for "set", "add <jar|file|archive>" and hive queries are completely separated. We probably want to support this in all 3 cases. Unless we extract the common code out, we will have to replicate the variable substitution code, which is even worse. +1 on the patch given that we don't have a single place to extract the strings. We probably need to modify the syntax of "add jar" etc to use quotes for string constants.
          Hide
          Edward Capriolo added a comment -

          I decided we should introduce a configuration variable to control the replacement. Otherwise no way to escape the variable.

          For some strange reason my .q file blows up when a comment is in it ... any ideas.

          Show
          Edward Capriolo added a comment - I decided we should introduce a configuration variable to control the replacement. Otherwise no way to escape the variable. For some strange reason my .q file blows up when a comment is in it ... any ideas.
          Edward Capriolo made changes -
          Attachment hive-1096-2.diff [ 12434364 ]
          Hide
          Zheng Shao added a comment -

          Edward, currently Hive only allows "--" comment before a query, but not before a "set" command or something like that. That might be the reason. (It ought to be fixed)

          Show
          Zheng Shao added a comment - Edward, currently Hive only allows "--" comment before a query, but not before a "set" command or something like that. That might be the reason. (It ought to be fixed)
          Hide
          Edward Capriolo added a comment -

          @Zheng

          I want to laugh now, but I spent about 4-5 hours trying to figure out what problem Hive was having. In general trying to debug the test cases through the testCliDriver can sometimes be difficult. Sometimes you get a decent stack trace, sometimes you get error code 11 or error code 2. I agree it ought to be fixed.

          Ok well now that I know the comment issue is not actually my problem I will remove the print statements and submit something that could go +1

          Show
          Edward Capriolo added a comment - @Zheng I want to laugh now, but I spent about 4-5 hours trying to figure out what problem Hive was having. In general trying to debug the test cases through the testCliDriver can sometimes be difficult. Sometimes you get a decent stack trace, sometimes you get error code 11 or error code 2. I agree it ought to be fixed. Ok well now that I know the comment issue is not actually my problem I will remove the print statements and submit something that could go +1
          Hide
          Prasad Chakka added a comment -

          @edward, I usually use Eclipse to debug these kind of things. Much faster IMO.

          Show
          Prasad Chakka added a comment - @edward, I usually use Eclipse to debug these kind of things. Much faster IMO.
          Hide
          Edward Capriolo added a comment -

          Patch adds pre-parsed variable replacement. Also added is a configuration variable hive.variable.replace to turn this feature on and off. (set to false by default)

          Show
          Edward Capriolo added a comment - Patch adds pre-parsed variable replacement. Also added is a configuration variable hive.variable.replace to turn this feature on and off. (set to false by default)
          Edward Capriolo made changes -
          Attachment hive-1096-7.diff [ 12434442 ]
          Edward Capriolo made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Zheng Shao added a comment -

          Looks great! It seems with your approach, a variable can be a number, string, or whatever else (like a WHERE condition).

          Can you modify the test case to include those cases?

          Show
          Zheng Shao added a comment - Looks great! It seems with your approach, a variable can be a number, string, or whatever else (like a WHERE condition). Can you modify the test case to include those cases?
          Zheng Shao made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Edward Capriolo added a comment -

          Added more rigorous test case.

          Show
          Edward Capriolo added a comment - Added more rigorous test case.
          Edward Capriolo made changes -
          Attachment hive-1096-8.diff [ 12434851 ]
          Edward Capriolo made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Namit Jain added a comment -

          Can you re-generate the patch ? It is not applying cleanly.

          Show
          Namit Jain added a comment - Can you re-generate the patch ? It is not applying cleanly.
          Hide
          Edward Capriolo added a comment -

          Regenerated against trunk.

          Show
          Edward Capriolo added a comment - Regenerated against trunk.
          Edward Capriolo made changes -
          Attachment 1096-9.diff [ 12436156 ]
          Hide
          Carl Steinbach added a comment -
          • Test cases?
          • Using the HiveConf namespace for user variables seems like a bad idea, especially since there are no checks in place to prevent people from defining variables with names like "hive.foo.bar" Giving users unprotected access to your configuration namespace usually leads to problems down the road.
          • It would be nice to be able to reference Java system properties using this syntax.
          • It would be nice to be able to nest variable definitions, e.g. version="0.6.0", jar_name="hive-exec-$ {version}

            ". The variable interpolation code in Hadoop's Configuration class does this.

          • It would be nice to be able to prevent interpolation using an escape character, e.g. "\$ {somevar}

            ".

          • Driver.replace(String) should have a name like Driver.interpolateCommandVariables(), or Driver.replaceVariables().
          • Driver.replace() iterates over the list of defined variables. Instead, I think it should iterate over the tokens in the command that match the pattern '$ {.*}

            '. This would make it easy to log any cases where the command contains "$

            {foo.bar}

            " and foo.bar is undefined.

          • Replace the reference to the string literal "hive.variable.replace" with HIVEVARREPLACE.
          Show
          Carl Steinbach added a comment - Test cases? Using the HiveConf namespace for user variables seems like a bad idea, especially since there are no checks in place to prevent people from defining variables with names like "hive.foo.bar" Giving users unprotected access to your configuration namespace usually leads to problems down the road. It would be nice to be able to reference Java system properties using this syntax. It would be nice to be able to nest variable definitions, e.g. version="0.6.0", jar_name="hive-exec-$ {version} ". The variable interpolation code in Hadoop's Configuration class does this. It would be nice to be able to prevent interpolation using an escape character, e.g. "\$ {somevar} ". Driver.replace(String) should have a name like Driver.interpolateCommandVariables(), or Driver.replaceVariables(). Driver.replace() iterates over the list of defined variables. Instead, I think it should iterate over the tokens in the command that match the pattern '$ {.*} '. This would make it easy to log any cases where the command contains "$ {foo.bar} " and foo.bar is undefined. Replace the reference to the string literal "hive.variable.replace" with HIVEVARREPLACE.
          Hide
          Carl Steinbach added a comment -

          Test cases?

          Missed the test case. Sorry.

          Also, I think that in order to get committed this needs to
          address the use case described in HIVE-1063. The last thing
          we want is two completely separate variable interpolation
          mechanisms.

          Show
          Carl Steinbach added a comment - Test cases? Missed the test case. Sorry. Also, I think that in order to get committed this needs to address the use case described in HIVE-1063 . The last thing we want is two completely separate variable interpolation mechanisms.
          Hide
          Edward Capriolo added a comment -

          >>
          Using the HiveConf namespace for user variables seems like a bad idea, especially since there are no checks in place to prevent people from defining variables with names like "hive.foo.bar" Giving users unprotected access to your configuration namespace usually leads to problems down the road.
          Philosophically I agree. In actuality have Hive/Hadoop conf is easily manipulated by changing your hadoop-site.xml or hive-site.xml. Users do have unprotected access to the namespace that is the nature of hadoop. Users of hive are setting variables all the time.

          >>
          It would be nice to be able to reference Java system properties using this syntax.
          Hive /Hadoop do not often refer to system properties during normal operation. I am on the fence about this.

          >>Driver.replace() iterates over the list of defined variables. Instead, I think it should iterate over the tokens in the command that match the pattern '$

          {.*}

          '.

          The hive CLI really needs some type of top level parser. Because we don't have this there are many ways we could do certain things but all of them are a little 'hackish'. If we had a real parser, reading character by character, we would not need a regex or string replace to do variable processing.

          >>
          This would make it easy to log any cases where the command contains "$

          {foo.bar}

          " and foo.bar is undefined.

          I think the code should replace "" for an undefined variable.

          >>Driver.replace(String) should have a name like Driver.interpolateCommandVariables(), or Driver.replaceVariables().
          Agreed, my notorious spelling usually causes me to avoid words like 'interpolate'

          >>It would be nice to be able to prevent interpolation using an escape character, e.g. "\$

          {somevar}

          ".
          I punted on that by being able to turn the replacement on and off. As I said above true parser would do stuff like this.

          >>It would be nice to be able to nest variable definitions, e.g. version="0.6.0", jar_name="hive-exec-$

          {version}

          ". The variable interpolation code in Hadoop's Configuration class does this.
          Yes. Carl I started with your code, the bug I mentioned above (comments before set), caused me to rip the entire thing apart, before I found out what I was doing wrong I had ripped out all the regex. I am not very comfortable with regex, and I was not in love with the while construct with a return in the middle. Again nesting would be easy with a true parser. I do not mind going back to your code.

          >>The last thing we want is two completely separate variable interpolation mechanisms.

          Agreed. The only true difference in implementation is that your doing it with properties and I am doing it with HiveConf Vars. If we support both I think we are both happy. Any ideas?

          Show
          Edward Capriolo added a comment - >> Using the HiveConf namespace for user variables seems like a bad idea, especially since there are no checks in place to prevent people from defining variables with names like "hive.foo.bar" Giving users unprotected access to your configuration namespace usually leads to problems down the road. Philosophically I agree. In actuality have Hive/Hadoop conf is easily manipulated by changing your hadoop-site.xml or hive-site.xml. Users do have unprotected access to the namespace that is the nature of hadoop. Users of hive are setting variables all the time. >> It would be nice to be able to reference Java system properties using this syntax. Hive /Hadoop do not often refer to system properties during normal operation. I am on the fence about this. >>Driver.replace() iterates over the list of defined variables. Instead, I think it should iterate over the tokens in the command that match the pattern '$ {.*} '. The hive CLI really needs some type of top level parser. Because we don't have this there are many ways we could do certain things but all of them are a little 'hackish'. If we had a real parser, reading character by character, we would not need a regex or string replace to do variable processing. >> This would make it easy to log any cases where the command contains "$ {foo.bar} " and foo.bar is undefined. I think the code should replace "" for an undefined variable. >>Driver.replace(String) should have a name like Driver.interpolateCommandVariables(), or Driver.replaceVariables(). Agreed, my notorious spelling usually causes me to avoid words like 'interpolate' >>It would be nice to be able to prevent interpolation using an escape character, e.g. "\$ {somevar} ". I punted on that by being able to turn the replacement on and off. As I said above true parser would do stuff like this. >>It would be nice to be able to nest variable definitions, e.g. version="0.6.0", jar_name="hive-exec-$ {version} ". The variable interpolation code in Hadoop's Configuration class does this. Yes. Carl I started with your code, the bug I mentioned above (comments before set), caused me to rip the entire thing apart, before I found out what I was doing wrong I had ripped out all the regex. I am not very comfortable with regex, and I was not in love with the while construct with a return in the middle. Again nesting would be easy with a true parser. I do not mind going back to your code. >>The last thing we want is two completely separate variable interpolation mechanisms. Agreed. The only true difference in implementation is that your doing it with properties and I am doing it with HiveConf Vars. If we support both I think we are both happy. Any ideas?
          Hide
          Carl Steinbach added a comment -

          Philosophically I agree. In actuality have Hive/Hadoop conf is easily manipulated by changing your hadoop-site.xml or hive-site.xml. Users do have unprotected access to the namespace that is the nature of hadoop. Users of hive are setting variables all the time.

          True, but I think we should try to improve the situation. As a start we can add code to throw an error if hive-default.xml or hive-site.xml sets a hive.* configuration property that is not defined in HiveConf. This would protect the hive.* namespace and at the same time make it easy to track down cases where folks misspell a hive.* property name.

          The only true difference in implementation is that your doing it with properties and I am doing it with HiveConf Vars. If we support both I think we are both happy. Any ideas?

          I agree that we should support access to both system properties and hiveconf properties, but if we do how will we resolve cases where the user references {{$

          {foo.bar}

          }} and both the system and hiveconf define properties named foo.bar? Also, another problem I see with using the hiveconf namespace for user variable definitions is that user variables cease to have any meaning past the client-side query preprocessing step, yet since they're part of the hiveconf they will get included in the jobconf and sent to datanodes.

          Here's a proposal:

          • Allow users to reference variables in QL statements using the syntax {{$ {namespace:variable_name}

            }}.

          • Users can define variables on the command line using a new "-hivevar x=y" switch. Values defined in this manner become part of the user namespace, which is the default namespace. They can be referenced as either {{$ {default:variablename}

            }} or {{$

            {variablename}

            }}.

          • Hive configuration properties are part of the "hiveconf" namespace, and can be referenced as {{$ {hiveconf:propertyname}

            }}.

          • System properties are part of the "system" namespace, and can be referenced as {{$ {system:property_name}

            }}.

          What do you think?

          Show
          Carl Steinbach added a comment - Philosophically I agree. In actuality have Hive/Hadoop conf is easily manipulated by changing your hadoop-site.xml or hive-site.xml. Users do have unprotected access to the namespace that is the nature of hadoop. Users of hive are setting variables all the time. True, but I think we should try to improve the situation. As a start we can add code to throw an error if hive-default.xml or hive-site.xml sets a hive.* configuration property that is not defined in HiveConf. This would protect the hive.* namespace and at the same time make it easy to track down cases where folks misspell a hive.* property name. The only true difference in implementation is that your doing it with properties and I am doing it with HiveConf Vars. If we support both I think we are both happy. Any ideas? I agree that we should support access to both system properties and hiveconf properties, but if we do how will we resolve cases where the user references {{$ {foo.bar} }} and both the system and hiveconf define properties named foo.bar? Also, another problem I see with using the hiveconf namespace for user variable definitions is that user variables cease to have any meaning past the client-side query preprocessing step, yet since they're part of the hiveconf they will get included in the jobconf and sent to datanodes. Here's a proposal: Allow users to reference variables in QL statements using the syntax {{$ {namespace:variable_name} }}. Users can define variables on the command line using a new " -hivevar x=y " switch. Values defined in this manner become part of the user namespace, which is the default namespace. They can be referenced as either {{$ {default:variablename} }} or {{$ {variablename} }}. Hive configuration properties are part of the "hiveconf" namespace, and can be referenced as {{$ {hiveconf:propertyname} }}. System properties are part of the "system" namespace, and can be referenced as {{$ {system:property_name} }}. What do you think?
          Hide
          Jeff Hammerbacher added a comment -

          Where does the $

          {VARNAME}

          syntax come from? SCOPE uses @@VARNAME for their syntax for a somewhat similar feature. Just curious.

          Show
          Jeff Hammerbacher added a comment - Where does the $ {VARNAME} syntax come from? SCOPE uses @@VARNAME for their syntax for a somewhat similar feature. Just curious.
          Hide
          Carl Steinbach added a comment -

          Where does the ${VARNAME} syntax come from? SCOPE uses @@VARNAME for their syntax for a somewhat similar feature. Just curious.

          Unix shell command interpreters, Perl, Make, Ant, Eclipse, Amazon's EMR version of Hive and many templating systems (which is basically what this is) use the unary operator $ for string substitution. Eclipse also defines various namespaces, so for example it is possible to reference environment variables as $

          {env_var:HOME}

          and system properties as $

          {system_property:java.home}

          . MySQL and T-SQL both use @ to identify user variables, and @@ to identify system variables (see http://dev.mysql.com/doc/refman/5.0/en/user-variables.html), which was probably the inspiration for SCOPE's syntax. SQL*Plus apparently uses '&'.

          Show
          Carl Steinbach added a comment - Where does the ${VARNAME} syntax come from? SCOPE uses @@VARNAME for their syntax for a somewhat similar feature. Just curious. Unix shell command interpreters, Perl, Make, Ant, Eclipse, Amazon's EMR version of Hive and many templating systems (which is basically what this is) use the unary operator $ for string substitution. Eclipse also defines various namespaces, so for example it is possible to reference environment variables as $ {env_var:HOME} and system properties as $ {system_property:java.home} . MySQL and T-SQL both use @ to identify user variables, and @@ to identify system variables (see http://dev.mysql.com/doc/refman/5.0/en/user-variables.html ), which was probably the inspiration for SCOPE's syntax. SQL*Plus apparently uses '&'.
          Hide
          Edward Capriolo added a comment -

          True, but I think we should try to improve the situation. As a start we can add code to throw an error if hive-default.xml or hive-site.xml sets a hive.* configuration property that is not defined in HiveConf. This would protect the hive.* namespace and at the same time make it easy to track down cases where folks misspell a hive.* property name.

          I only see one problem with this. HiveConf has an enumeration of valid properties. I do not think Hadoop Conf shares that trait. So if someone were to say:

          set hadoop.wrong.variable.name
          

          Would we have a simple way of knowing if that variable was variable is valid. Anyway that is out of scope a bit.

          $

          Unknown macro: {namespace}

          .

          I like this. How about?

          ${hiveconf:variable_name}.
          ${env:variable_name}.
          ${property:variable_name}.
          

          This way we do not need the "-hivevar x=y" switch." I say this because we already have three mechanism to pass data into Java/hive adding a forth is a little confusing.

          The only problem with introducing this is now we have a new question. How do we extend SET to read/write these?

          Show
          Edward Capriolo added a comment - True, but I think we should try to improve the situation. As a start we can add code to throw an error if hive-default.xml or hive-site.xml sets a hive.* configuration property that is not defined in HiveConf. This would protect the hive.* namespace and at the same time make it easy to track down cases where folks misspell a hive.* property name. I only see one problem with this. HiveConf has an enumeration of valid properties. I do not think Hadoop Conf shares that trait. So if someone were to say: set hadoop.wrong.variable.name Would we have a simple way of knowing if that variable was variable is valid. Anyway that is out of scope a bit. $ Unknown macro: {namespace} . I like this. How about? ${hiveconf:variable_name}. ${env:variable_name}. ${property:variable_name}. This way we do not need the "-hivevar x=y" switch." I say this because we already have three mechanism to pass data into Java/hive adding a forth is a little confusing. The only problem with introducing this is now we have a new question. How do we extend SET to read/write these?
          Hide
          Carl Steinbach added a comment -

          I only see one problem with this. HiveConf has an enumeration of valid properties. I do not think Hadoop Conf shares that trait.

          You're right, it doesn't. So the best we can do is try to protect the hive.* namespace, which is still better than nothing.

          In your example, what do the "env" and "property" namespaces signify? Are these environment variables and and system properties, respectively?

          Show
          Carl Steinbach added a comment - I only see one problem with this. HiveConf has an enumeration of valid properties. I do not think Hadoop Conf shares that trait. You're right, it doesn't. So the best we can do is try to protect the hive.* namespace, which is still better than nothing. In your example, what do the "env" and "property" namespaces signify? Are these environment variables and and system properties, respectively?
          Hide
          Edward Capriolo added a comment -

          In your example, what do the "env" and "property" namespaces signify? Are these environment variables and and system properties, respectively?

          Sorry I misread something you posted above. To be double clear

          ${hiveconf:variable_name}. --hive conf   && also SET
          ${env:variable_name}. --environmental properties
          ${system:property_name}-- java/system properties
          ${default:variable_name}. --set with -hivevar x=y
          

          I understand you want to introduce '-hivevar x' to create a very clear separation. Do you think we need to go this far? As far as I know properties and environment are mostly unused by hive.

          I am looking at this from terms of a new end user to hive. The simple part about my 1096-9.diff is we were expanding upon the 'SET' and '-hiveconf'. What we have arrived at is cleaner but more complex, and now we have to dip into the SetProcessor and introduce a new switch.

          I am ok with the more complex approach however. Does anyone else want to chime in?

          Show
          Edward Capriolo added a comment - In your example, what do the "env" and "property" namespaces signify? Are these environment variables and and system properties, respectively? Sorry I misread something you posted above. To be double clear ${hiveconf:variable_name}. --hive conf && also SET ${env:variable_name}. --environmental properties ${system:property_name}-- java/system properties ${default:variable_name}. --set with -hivevar x=y I understand you want to introduce '-hivevar x' to create a very clear separation. Do you think we need to go this far? As far as I know properties and environment are mostly unused by hive. I am looking at this from terms of a new end user to hive. The simple part about my 1096-9.diff is we were expanding upon the 'SET' and '-hiveconf'. What we have arrived at is cleaner but more complex, and now we have to dip into the SetProcessor and introduce a new switch. I am ok with the more complex approach however. Does anyone else want to chime in?
          Carl Steinbach made changes -
          Link This issue relates to HIVE-1063 [ HIVE-1063 ]
          Hide
          John Sichi added a comment -

          I'm taking a look at this one.

          Show
          John Sichi added a comment - I'm taking a look at this one.
          Hide
          John Sichi added a comment -

          I'm getting a conflict with latest trunk--can you regenerate the patch? (Use filename HIVE-1096.10.patch to follow the usual convention.)

          Sorry it took us so long to get to this one; here are a few review comments to be addressed as part of a new patch:

          • add a test with a query which references the same variable twice
          • add a test which references a built-in variable such as hive.map.aggr
          • add a test with hive.variable.replace=false to verify that expansion does not happen when the variable is turned off (reference the variable in a literal string)
          • add a test which references a variable which has not been set
          • for description, change "This setting controls if hive will..." to "Whether hive will..."
          • rename method to replaceVariableReferences
          • for the LOG.info's, add a space after the colon, and change "replacement" to "variable replacement" for clarity
          • run ant checkstyle and fix some issues with spaces before braces and arglists
          • to optimize the replacement, you could do a contains("${") precheck before looping over conf
          • if you want to allow for uber-cool recursive variable expansion, change if (command.contains(s)) to a while instead
          Show
          John Sichi added a comment - I'm getting a conflict with latest trunk--can you regenerate the patch? (Use filename HIVE-1096 .10.patch to follow the usual convention.) Sorry it took us so long to get to this one; here are a few review comments to be addressed as part of a new patch: add a test with a query which references the same variable twice add a test which references a built-in variable such as hive.map.aggr add a test with hive.variable.replace=false to verify that expansion does not happen when the variable is turned off (reference the variable in a literal string) add a test which references a variable which has not been set for description, change "This setting controls if hive will..." to "Whether hive will..." rename method to replaceVariableReferences for the LOG.info's, add a space after the colon, and change "replacement" to "variable replacement" for clarity run ant checkstyle and fix some issues with spaces before braces and arglists to optimize the replacement, you could do a contains("${") precheck before looping over conf if you want to allow for uber-cool recursive variable expansion, change if (command.contains(s)) to a while instead
          John Sichi made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Paul Yang added a comment -

          I like the idea of 'uber-cool recursive variable expansion'. Probably need a depth check though.

          Show
          Paul Yang added a comment - I like the idea of 'uber-cool recursive variable expansion'. Probably need a depth check though.
          Hide
          John Sichi added a comment -

          @Paul: heh, good point. Actually let's just leave this out since doing it correctly requires a different loop structure entirely; we can always do a followup for it if someone wants it (ant added them eventually).

          Show
          John Sichi added a comment - @Paul: heh, good point. Actually let's just leave this out since doing it correctly requires a different loop structure entirely; we can always do a followup for it if someone wants it (ant added them eventually).
          Hide
          Carl Steinbach added a comment -

          @Paul and John: Check out substituteVars() in Hadoop's Configuration class. It does recursive
          variable interpolation with a depth check.

          Show
          Carl Steinbach added a comment - @Paul and John: Check out substituteVars() in Hadoop's Configuration class. It does recursive variable interpolation with a depth check.
          Hide
          John Sichi added a comment -

          @Carl: oops, that's what I get for only reading the tail of the thread.

          OK, besides the interpolation question, are there other unresolved issues from the discussion between you and Ed? Let's try to drive this to closure.

          Show
          John Sichi added a comment - @Carl: oops, that's what I get for only reading the tail of the thread. OK, besides the interpolation question, are there other unresolved issues from the discussion between you and Ed? Let's try to drive this to closure.
          Hide
          Carl Steinbach added a comment -

          @John: I don't think the current patch includes the namespace tags that Ed and I
          agreed to in a previous comment. I'd like to see that get added, and in particular the ability
          to reference system properties (this is really useful for testing).

          Show
          Carl Steinbach added a comment - @John: I don't think the current patch includes the namespace tags that Ed and I agreed to in a previous comment. I'd like to see that get added, and in particular the ability to reference system properties (this is really useful for testing).
          Hide
          Edward Capriolo added a comment -

          I am back on this one. Keep your eye out for the next patch.

          Show
          Edward Capriolo added a comment - I am back on this one. Keep your eye out for the next patch.
          Carl Steinbach made changes -
          Fix Version/s 0.6.0 [ 12314524 ]
          Affects Version/s 0.5.0 [ 12314156 ]
          Component/s Query Processor [ 12312586 ]
          Hide
          Carl Steinbach added a comment -

          Parameters Substitution in Pig: http://wiki.apache.org/pig/ParameterSubstitution

          Show
          Carl Steinbach added a comment - Parameters Substitution in Pig: http://wiki.apache.org/pig/ParameterSubstitution
          Hide
          Carl Steinbach added a comment -

          @Ed: Any update on this? I'd like to help out if you are low on cycles.

          Show
          Carl Steinbach added a comment - @Ed: Any update on this? I'd like to help out if you are low on cycles.
          Hide
          Edward Capriolo added a comment -

          This is my next target after the xdoc ticket. I should not be to far off.

          Show
          Edward Capriolo added a comment - This is my next target after the xdoc ticket. I should not be to far off.
          Hide
          Edward Capriolo added a comment -

          Patch adds variable interpretation.

          Show
          Edward Capriolo added a comment - Patch adds variable interpretation.
          Edward Capriolo made changes -
          Attachment hive-1096-10-patch.txt [ 12447842 ]
          Edward Capriolo made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Carl Steinbach added a comment -

          Hi Ed, can you please post this patch to review.hbase.org? Thanks!

          Show
          Carl Steinbach added a comment - Hi Ed, can you please post this patch to review.hbase.org? Thanks!
          Carl Steinbach made changes -
          Fix Version/s 0.5.1 [ 12314793 ]
          Fix Version/s 0.7.0 [ 12315150 ]
          Carl Steinbach made changes -
          Fix Version/s 0.5.1 [ 12314793 ]
          Affects Version/s 0.5.0 [ 12314156 ]
          Hide
          Edward Capriolo added a comment -

          Was not interpolating system:vars. Fixed with better test case.

          Show
          Edward Capriolo added a comment - Was not interpolating system:vars. Fixed with better test case.
          Edward Capriolo made changes -
          Attachment hive-1096-11-patch.txt [ 12447917 ]
          Hide
          John Sichi added a comment -

          Carl, can you review?

          Show
          John Sichi added a comment - Carl, can you review?
          Hide
          Carl Steinbach added a comment -

          @John: Will do (hopefully before the end of today).

          Show
          Carl Steinbach added a comment - @John: Will do (hopefully before the end of today).
          Hide
          Carl Steinbach added a comment -

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1289>

          This code does not belong in common. Please move it to ql.

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1290>

          The variables varPat and MAX_SUBST both exist as private variables in the parent class. I think that redefining them in HiveConf will result in lots of confusion down the road, plus they don't really belong here. Please move this to a "VariableSubstitution" class located in ql.

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1291>

          I like the word "interpolate" too, but I think more people are probably familiar with "substitute". Please change the name to HIVESUBSTITUTEVARIABLES.

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1292>

          Please fix the formatting problems in this method (spaces before and after parens, no space between literals and operators, etc). Please run checkstyle and verify that you are not introducing any new checkstyle errors.

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1293>

          No need to test a boolean valued method for equality to true. The logic can also be simplified as follows:

          if (!getBoolVar(ConfVars.HIVEVARIABLEINTERPOLATE))

          { return expr; }

          l4j.info("Interpolation is on");
          ...

          Also, please move this out of the substitution function and into the Driver, i.e Driver.compile() calls VariableSubstitution.substitute() iff HIVESUBSTITUEVARIABLES == true.

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1294>

          This log message is going to generate a lot of noise, and is unnecessary since you can determine the value using the 'set' command. Please remove it.

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1295>

          Presumably there aren't going to be any properties in the HiveConf starting with "system:" or "env:". Please move the "env:" check before the conf check, and move these string comparisons outside of the try/catch blocks, e.g:

          private static final String SYSTEM_VAR_PREFIX = "system:";
          private static final String ENV_VAR_PREFIX = "env:";

          if (var.startsWith(SYSTEM_VAR_PREFIX)) {
          try

          { val = System.getProperty(var.substring(SYSTEM_VAR_PREFIX.length())); }

          catch (SecurityException se)

          { ... }

          } else if (var.startsWith(ENV_VAR_PREFIX))

          { val = System.getenv(var.substring(ENV_VAR_PREFIX.length())); }

          else

          { val = ss.getConf().get(var); }

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1303>

          I thought the plan was to also introduce a "conf:" prefix for referencing configuration properties, and that any value not prefixed by system/env/conf would map to a new variable namespace? Please add the conf prefix and introduce some state in the SessionState for storing variables (i.e. non System/Env/Conf properties).

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1296>

          Please remove this log call. It will generate a lot of noise.

          trunk/conf/hive-default.xml
          <http://review.hbase.org/r/229/#comment1297>

          Please change the name to "hive.substitute.variables", and the description to "Enable variable substitution".

          trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java
          <http://review.hbase.org/r/229/#comment1298>

          Check HIVESUBSTITUTEVARIABLES here.

          trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java
          <http://review.hbase.org/r/229/#comment1299>

          Please create static final variables for "system:" and "env:" and call length() on these variables instead of using magic numbers.

          trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java
          <http://review.hbase.org/r/229/#comment1300>

          I don't think you want to perform variable substitution at this point. It makes it impossible to create nested variables.

          trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java
          <http://review.hbase.org/r/229/#comment1301>

          This logic for mapping variable names to values has been repeated several times. Please move it to a dedicated method in the VariableSubstitution class.

          trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java
          <http://review.hbase.org/r/229/#comment1302>

          Please run checkstyle and fix any checkstyle errors that you have introduced.

          trunk/ql/src/test/queries/clientpositive/set_processor_namespaces.q
          <http://review.hbase.org/r/229/#comment1304>

          You need a test for the "env" namespace. I think this is impossible to do here, so you probably need to add a unit test.

          Show
          Carl Steinbach added a comment - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1289 > This code does not belong in common. Please move it to ql. trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1290 > The variables varPat and MAX_SUBST both exist as private variables in the parent class. I think that redefining them in HiveConf will result in lots of confusion down the road, plus they don't really belong here. Please move this to a "VariableSubstitution" class located in ql. trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1291 > I like the word "interpolate" too, but I think more people are probably familiar with "substitute". Please change the name to HIVESUBSTITUTEVARIABLES. trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1292 > Please fix the formatting problems in this method (spaces before and after parens, no space between literals and operators, etc). Please run checkstyle and verify that you are not introducing any new checkstyle errors. trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1293 > No need to test a boolean valued method for equality to true. The logic can also be simplified as follows: if (!getBoolVar(ConfVars.HIVEVARIABLEINTERPOLATE)) { return expr; } l4j.info("Interpolation is on"); ... Also, please move this out of the substitution function and into the Driver, i.e Driver.compile() calls VariableSubstitution.substitute() iff HIVESUBSTITUEVARIABLES == true. trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1294 > This log message is going to generate a lot of noise, and is unnecessary since you can determine the value using the 'set' command. Please remove it. trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1295 > Presumably there aren't going to be any properties in the HiveConf starting with "system:" or "env:". Please move the "env:" check before the conf check, and move these string comparisons outside of the try/catch blocks, e.g: private static final String SYSTEM_VAR_PREFIX = "system:"; private static final String ENV_VAR_PREFIX = "env:"; if (var.startsWith(SYSTEM_VAR_PREFIX)) { try { val = System.getProperty(var.substring(SYSTEM_VAR_PREFIX.length())); } catch (SecurityException se) { ... } } else if (var.startsWith(ENV_VAR_PREFIX)) { val = System.getenv(var.substring(ENV_VAR_PREFIX.length())); } else { val = ss.getConf().get(var); } trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1303 > I thought the plan was to also introduce a "conf:" prefix for referencing configuration properties, and that any value not prefixed by system/env/conf would map to a new variable namespace? Please add the conf prefix and introduce some state in the SessionState for storing variables (i.e. non System/Env/Conf properties). trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1296 > Please remove this log call. It will generate a lot of noise. trunk/conf/hive-default.xml < http://review.hbase.org/r/229/#comment1297 > Please change the name to "hive.substitute.variables", and the description to "Enable variable substitution". trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java < http://review.hbase.org/r/229/#comment1298 > Check HIVESUBSTITUTEVARIABLES here. trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java < http://review.hbase.org/r/229/#comment1299 > Please create static final variables for "system:" and "env:" and call length() on these variables instead of using magic numbers. trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java < http://review.hbase.org/r/229/#comment1300 > I don't think you want to perform variable substitution at this point. It makes it impossible to create nested variables. trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java < http://review.hbase.org/r/229/#comment1301 > This logic for mapping variable names to values has been repeated several times. Please move it to a dedicated method in the VariableSubstitution class. trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java < http://review.hbase.org/r/229/#comment1302 > Please run checkstyle and fix any checkstyle errors that you have introduced. trunk/ql/src/test/queries/clientpositive/set_processor_namespaces.q < http://review.hbase.org/r/229/#comment1304 > You need a test for the "env" namespace. I think this is impossible to do here, so you probably need to add a unit test.
          Carl Steinbach made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          HBase Review Board added a comment -

          Message from: "Carl Steinbach" <carl@cloudera.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/229/#review288
          -----------------------------------------------------------

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1289>

          This code does not belong in common. Please move it to ql.

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1290>

          The variables varPat and MAX_SUBST both exist as private variables in the parent class. I think that redefining them in HiveConf will result in lots of confusion down the road, plus they don't really belong here. Please move this to a "VariableSubstitution" class located in ql.

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1291>

          I like the word "interpolate" too, but I think more people are probably familiar with "substitute". Please change the name to HIVESUBSTITUTEVARIABLES.

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1292>

          Please fix the formatting problems in this method (spaces before and after parens, no space between literals and operators, etc). Please run checkstyle and verify that you are not introducing any new checkstyle errors.

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1293>

          No need to test a boolean valued method for equality to true. The logic can also be simplified as follows:

          if (!getBoolVar(ConfVars.HIVEVARIABLEINTERPOLATE))

          { return expr; }

          l4j.info("Interpolation is on");
          ...

          Also, please move this out of the substitution function and into the Driver, i.e Driver.compile() calls VariableSubstitution.substitute() iff HIVESUBSTITUEVARIABLES == true.

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1294>

          This log message is going to generate a lot of noise, and is unnecessary since you can determine the value using the 'set' command. Please remove it.

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1295>

          Presumably there aren't going to be any properties in the HiveConf starting with "system:" or "env:". Please move the "env:" check before the conf check, and move these string comparisons outside of the try/catch blocks, e.g:

          private static final String SYSTEM_VAR_PREFIX = "system:";
          private static final String ENV_VAR_PREFIX = "env:";

          if (var.startsWith(SYSTEM_VAR_PREFIX)) {
          try

          { val = System.getProperty(var.substring(SYSTEM_VAR_PREFIX.length())); }

          catch (SecurityException se)

          { ... }

          } else if (var.startsWith(ENV_VAR_PREFIX))

          { val = System.getenv(var.substring(ENV_VAR_PREFIX.length())); }

          else

          { val = ss.getConf().get(var); }

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1303>

          I thought the plan was to also introduce a "conf:" prefix for referencing configuration properties, and that any value not prefixed by system/env/conf would map to a new variable namespace? Please add the conf prefix and introduce some state in the SessionState for storing variables (i.e. non System/Env/Conf properties).

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          <http://review.hbase.org/r/229/#comment1296>

          Please remove this log call. It will generate a lot of noise.

          trunk/conf/hive-default.xml
          <http://review.hbase.org/r/229/#comment1297>

          Please change the name to "hive.substitute.variables", and the description to "Enable variable substitution".

          trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java
          <http://review.hbase.org/r/229/#comment1298>

          Check HIVESUBSTITUTEVARIABLES here.

          trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java
          <http://review.hbase.org/r/229/#comment1299>

          Please create static final variables for "system:" and "env:" and call length() on these variables instead of using magic numbers.

          trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java
          <http://review.hbase.org/r/229/#comment1300>

          I don't think you want to perform variable substitution at this point. It makes it impossible to create nested variables.

          trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java
          <http://review.hbase.org/r/229/#comment1301>

          This logic for mapping variable names to values has been repeated several times. Please move it to a dedicated method in the VariableSubstitution class.

          trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java
          <http://review.hbase.org/r/229/#comment1302>

          Please run checkstyle and fix any checkstyle errors that you have introduced.

          trunk/ql/src/test/queries/clientpositive/set_processor_namespaces.q
          <http://review.hbase.org/r/229/#comment1304>

          You need a test for the "env" namespace. I think this is impossible to do here, so you probably need to add a unit test.

          • Carl
          Show
          HBase Review Board added a comment - Message from: "Carl Steinbach" <carl@cloudera.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/229/#review288 ----------------------------------------------------------- trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1289 > This code does not belong in common. Please move it to ql. trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1290 > The variables varPat and MAX_SUBST both exist as private variables in the parent class. I think that redefining them in HiveConf will result in lots of confusion down the road, plus they don't really belong here. Please move this to a "VariableSubstitution" class located in ql. trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1291 > I like the word "interpolate" too, but I think more people are probably familiar with "substitute". Please change the name to HIVESUBSTITUTEVARIABLES. trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1292 > Please fix the formatting problems in this method (spaces before and after parens, no space between literals and operators, etc). Please run checkstyle and verify that you are not introducing any new checkstyle errors. trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1293 > No need to test a boolean valued method for equality to true. The logic can also be simplified as follows: if (!getBoolVar(ConfVars.HIVEVARIABLEINTERPOLATE)) { return expr; } l4j.info("Interpolation is on"); ... Also, please move this out of the substitution function and into the Driver, i.e Driver.compile() calls VariableSubstitution.substitute() iff HIVESUBSTITUEVARIABLES == true. trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1294 > This log message is going to generate a lot of noise, and is unnecessary since you can determine the value using the 'set' command. Please remove it. trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1295 > Presumably there aren't going to be any properties in the HiveConf starting with "system:" or "env:". Please move the "env:" check before the conf check, and move these string comparisons outside of the try/catch blocks, e.g: private static final String SYSTEM_VAR_PREFIX = "system:"; private static final String ENV_VAR_PREFIX = "env:"; if (var.startsWith(SYSTEM_VAR_PREFIX)) { try { val = System.getProperty(var.substring(SYSTEM_VAR_PREFIX.length())); } catch (SecurityException se) { ... } } else if (var.startsWith(ENV_VAR_PREFIX)) { val = System.getenv(var.substring(ENV_VAR_PREFIX.length())); } else { val = ss.getConf().get(var); } trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1303 > I thought the plan was to also introduce a "conf:" prefix for referencing configuration properties, and that any value not prefixed by system/env/conf would map to a new variable namespace? Please add the conf prefix and introduce some state in the SessionState for storing variables (i.e. non System/Env/Conf properties). trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java < http://review.hbase.org/r/229/#comment1296 > Please remove this log call. It will generate a lot of noise. trunk/conf/hive-default.xml < http://review.hbase.org/r/229/#comment1297 > Please change the name to "hive.substitute.variables", and the description to "Enable variable substitution". trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java < http://review.hbase.org/r/229/#comment1298 > Check HIVESUBSTITUTEVARIABLES here. trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java < http://review.hbase.org/r/229/#comment1299 > Please create static final variables for "system:" and "env:" and call length() on these variables instead of using magic numbers. trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java < http://review.hbase.org/r/229/#comment1300 > I don't think you want to perform variable substitution at this point. It makes it impossible to create nested variables. trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java < http://review.hbase.org/r/229/#comment1301 > This logic for mapping variable names to values has been repeated several times. Please move it to a dedicated method in the VariableSubstitution class. trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java < http://review.hbase.org/r/229/#comment1302 > Please run checkstyle and fix any checkstyle errors that you have introduced. trunk/ql/src/test/queries/clientpositive/set_processor_namespaces.q < http://review.hbase.org/r/229/#comment1304 > You need a test for the "env" namespace. I think this is impossible to do here, so you probably need to add a unit test. Carl
          Hide
          HBase Review Board added a comment -

          Message from: "Edward Capriolo" <edlinuxguru@gmail.com>

          On 2010-06-29 00:50:22, Carl Steinbach wrote:

          > trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 260

          > <http://review.hbase.org/r/229/diff/1/?file=1601#file1601line260>

          >

          > I like the word "interpolate" too, but I think more people are probably familiar with "substitute". Please change the name to HIVESUBSTITUTEVARIABLES.

          I do not want to have the feature called different things across the code base. Replace here interpolate there it will be confusing for all. You originally suggested interpolate: "Driver.interpolateCommandVariables()".

          On 2010-06-29 00:50:22, Carl Steinbach wrote:

          > trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 561

          > <http://review.hbase.org/r/229/diff/1/?file=1601#file1601line561>

          >

          > No need to test a boolean valued method for equality to true. The logic can also be simplified as follows:

          >

          > if (!getBoolVar(ConfVars.HIVEVARIABLEINTERPOLATE)) { bq. > return expr; bq. > }

          > l4j.info("Interpolation is on");

          > ...

          >

          > Also, please move this out of the substitution function and into the Driver, i.e Driver.compile() calls VariableSubstitution.substitute() iff HIVESUBSTITUEVARIABLES == true.

          Different components are using substitute SetProcessor, Driver , File Processor, having the interpolation on/off logic in each class is redundant. This way is better as it supports information hiding.

          On 2010-06-29 00:50:22, Carl Steinbach wrote:

          > trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 562

          > <http://review.hbase.org/r/229/diff/1/?file=1601#file1601line562>

          >

          > This log message is going to generate a lot of noise, and is unnecessary since you can determine the value using the 'set' command. Please remove it.

          I think it is very important to see the command before and after substitution. We can set this to debug, I do not think it is noise.

          On 2010-06-29 00:50:22, Carl Steinbach wrote:

          > trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 587

          > <http://review.hbase.org/r/229/diff/1/?file=1601#file1601line587>

          >

          > I thought the plan was to also introduce a "conf:" prefix for referencing configuration properties, and that any value not prefixed by system/env/conf would map to a new variable namespace? Please add the conf prefix and introduce some state in the SessionState for storing variables (i.e. non System/Env/Conf properties).

          After pondering this I think the "conf:" prefix is confusing and hurts backwards compatibility. Right now this is "hive hiveconf x=y" = "set x=y". I do not think we want to introduce another switch. "-hivevarconf". in most cases users want conf access to set properties that can be picked up by classes. hadoop & hive conf is the way it is adding something else will not fix the problem and will confuse people.

          On 2010-06-29 00:50:22, Carl Steinbach wrote:

          > trunk/ql/src/test/queries/clientpositive/set_processor_namespaces.q, line 4

          > <http://review.hbase.org/r/229/diff/1/?file=1605#file1605line4>

          >

          > You need a test for the "env" namespace. I think this is impossible to do here, so you probably need to add a unit test.

          Env variables are not changeable and are system dependent. I do not think there is a way to test these.

          On 2010-06-29 00:50:22, Carl Steinbach wrote:

          > trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 595

          > <http://review.hbase.org/r/229/diff/1/?file=1601#file1601line595>

          >

          > Please remove this log call. It will generate a lot of noise.

          I think it is very important to see the command before and after substitution. We can set this to debug, I do not think it is noise.

          On 2010-06-29 00:50:22, Carl Steinbach wrote:

          > trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 50

          > <http://review.hbase.org/r/229/diff/1/?file=1601#file1601line50>

          >

          > The variables varPat and MAX_SUBST both exist as private variables in the parent class. I think that redefining them in HiveConf will result in lots of confusion down the road, plus they don't really belong here. Please move this to a "VariableSubstitution" class located in ql.

          I see your point. Then again, hadoop does the substitution inside the conf class. Also QL is becoming the 'ubber package' At this point what isnt ql?

          On 2010-06-29 00:50:22, Carl Steinbach wrote:

          > trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java, line 93

          > <http://review.hbase.org/r/229/diff/1/?file=1604#file1604line93>

          >

          > I don't think you want to perform variable substitution at this point. It makes it impossible to create nested variables.

          I will check it out.

          • Edward

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/229/#review288
          -----------------------------------------------------------

          Show
          HBase Review Board added a comment - Message from: "Edward Capriolo" <edlinuxguru@gmail.com> On 2010-06-29 00:50:22, Carl Steinbach wrote: > trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 260 > < http://review.hbase.org/r/229/diff/1/?file=1601#file1601line260 > > > I like the word "interpolate" too, but I think more people are probably familiar with "substitute". Please change the name to HIVESUBSTITUTEVARIABLES. I do not want to have the feature called different things across the code base. Replace here interpolate there it will be confusing for all. You originally suggested interpolate: "Driver.interpolateCommandVariables()". On 2010-06-29 00:50:22, Carl Steinbach wrote: > trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 561 > < http://review.hbase.org/r/229/diff/1/?file=1601#file1601line561 > > > No need to test a boolean valued method for equality to true. The logic can also be simplified as follows: > > if (!getBoolVar(ConfVars.HIVEVARIABLEINTERPOLATE)) { bq. > return expr; bq. > } > l4j.info("Interpolation is on"); > ... > > Also, please move this out of the substitution function and into the Driver, i.e Driver.compile() calls VariableSubstitution.substitute() iff HIVESUBSTITUEVARIABLES == true. Different components are using substitute SetProcessor, Driver , File Processor, having the interpolation on/off logic in each class is redundant. This way is better as it supports information hiding. On 2010-06-29 00:50:22, Carl Steinbach wrote: > trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 562 > < http://review.hbase.org/r/229/diff/1/?file=1601#file1601line562 > > > This log message is going to generate a lot of noise, and is unnecessary since you can determine the value using the 'set' command. Please remove it. I think it is very important to see the command before and after substitution. We can set this to debug, I do not think it is noise. On 2010-06-29 00:50:22, Carl Steinbach wrote: > trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 587 > < http://review.hbase.org/r/229/diff/1/?file=1601#file1601line587 > > > I thought the plan was to also introduce a "conf:" prefix for referencing configuration properties, and that any value not prefixed by system/env/conf would map to a new variable namespace? Please add the conf prefix and introduce some state in the SessionState for storing variables (i.e. non System/Env/Conf properties). After pondering this I think the "conf:" prefix is confusing and hurts backwards compatibility. Right now this is "hive hiveconf x=y" = "set x=y". I do not think we want to introduce another switch. " -hivevarconf". in most cases users want conf access to set properties that can be picked up by classes. hadoop & hive conf is the way it is adding something else will not fix the problem and will confuse people. On 2010-06-29 00:50:22, Carl Steinbach wrote: > trunk/ql/src/test/queries/clientpositive/set_processor_namespaces.q, line 4 > < http://review.hbase.org/r/229/diff/1/?file=1605#file1605line4 > > > You need a test for the "env" namespace. I think this is impossible to do here, so you probably need to add a unit test. Env variables are not changeable and are system dependent. I do not think there is a way to test these. On 2010-06-29 00:50:22, Carl Steinbach wrote: > trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 595 > < http://review.hbase.org/r/229/diff/1/?file=1601#file1601line595 > > > Please remove this log call. It will generate a lot of noise. I think it is very important to see the command before and after substitution. We can set this to debug, I do not think it is noise. On 2010-06-29 00:50:22, Carl Steinbach wrote: > trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 50 > < http://review.hbase.org/r/229/diff/1/?file=1601#file1601line50 > > > The variables varPat and MAX_SUBST both exist as private variables in the parent class. I think that redefining them in HiveConf will result in lots of confusion down the road, plus they don't really belong here. Please move this to a "VariableSubstitution" class located in ql. I see your point. Then again, hadoop does the substitution inside the conf class. Also QL is becoming the 'ubber package' At this point what isnt ql? On 2010-06-29 00:50:22, Carl Steinbach wrote: > trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java, line 93 > < http://review.hbase.org/r/229/diff/1/?file=1604#file1604line93 > > > I don't think you want to perform variable substitution at this point. It makes it impossible to create nested variables. I will check it out. Edward ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/229/#review288 -----------------------------------------------------------
          Hide
          Edward Capriolo added a comment -

          Change interpolate to substitute....Added the substitution logic to file, dfs, set , and query processor

          Show
          Edward Capriolo added a comment - Change interpolate to substitute....Added the substitution logic to file, dfs, set , and query processor
          Edward Capriolo made changes -
          Attachment hive-1096-12.patch.txt [ 12449054 ]
          Edward Capriolo made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Carl Steinbach added a comment -

          Hi Ed, please update the diff on reviewboard. Thanks.

          Show
          Carl Steinbach added a comment - Hi Ed, please update the diff on reviewboard. Thanks.
          Hide
          Edward Capriolo added a comment -

          I am having trouble uploading with the update diff function of the review board. As I mentioned several times, I really had one simple requirement

          hive -hiveconf DAY=5 -e "LOAD DATA INFILE '/tmp/${DAY}'  into logs partition=${DAY}"
          

          I am all for doing things 100% correct, but this is such a simple thing I am really getting worn out by the endless revisions and doing lots of fancy things just because someone might want to do ${x$

          {y}

          bla}.
          Really, I would like to make this ticket go +1, and get on with something more interesting.

          Show
          Edward Capriolo added a comment - I am having trouble uploading with the update diff function of the review board. As I mentioned several times, I really had one simple requirement hive -hiveconf DAY=5 -e "LOAD DATA INFILE '/tmp/${DAY}' into logs partition=${DAY}" I am all for doing things 100% correct, but this is such a simple thing I am really getting worn out by the endless revisions and doing lots of fancy things just because someone might want to do ${x$ {y} bla}. Really, I would like to make this ticket go +1, and get on with something more interesting.
          Hide
          HBase Review Board added a comment -

          Message from: "Carl Steinbach" <carl@cloudera.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/229/
          -----------------------------------------------------------

          (Updated 2010-07-09 15:10:54.970189)

          Review request for Hive Developers.

          Changes
          -------

          Updated the diff with http://issues.apache.org/jira/secure/attachment/12449054/hive-1096-12.patch.txt

          Summary
          -------

          Hive Variables

          This addresses bug HIVE-1096.
          http://issues.apache.org/jira/browse/HIVE-1096

          Diffs (updated)


          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 955109
          trunk/conf/hive-default.xml 955109
          trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 955109
          trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/VariableSubstitution.java PRE-CREATION
          trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java 955109
          trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/DeleteResourceProcessor.java 955109
          trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java 955109
          trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java 955109
          trunk/ql/src/test/queries/clientpositive/set_processor_namespaces.q PRE-CREATION
          trunk/ql/src/test/results/clientpositive/set_processor_namespaces.q.out PRE-CREATION

          Diff: http://review.hbase.org/r/229/diff

          Testing
          -------

          Thanks,

          Edward

          Show
          HBase Review Board added a comment - Message from: "Carl Steinbach" <carl@cloudera.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/229/ ----------------------------------------------------------- (Updated 2010-07-09 15:10:54.970189) Review request for Hive Developers. Changes ------- Updated the diff with http://issues.apache.org/jira/secure/attachment/12449054/hive-1096-12.patch.txt Summary ------- Hive Variables This addresses bug HIVE-1096 . http://issues.apache.org/jira/browse/HIVE-1096 Diffs (updated) trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 955109 trunk/conf/hive-default.xml 955109 trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 955109 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/VariableSubstitution.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java 955109 trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/DeleteResourceProcessor.java 955109 trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java 955109 trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java 955109 trunk/ql/src/test/queries/clientpositive/set_processor_namespaces.q PRE-CREATION trunk/ql/src/test/results/clientpositive/set_processor_namespaces.q.out PRE-CREATION Diff: http://review.hbase.org/r/229/diff Testing ------- Thanks, Edward
          Hide
          Carl Steinbach added a comment -

          Notes from a design discussion on ##hive:

          • Variables, configuration properties, system properties and environment variables need to exist in separate namespaces.
          • Variables should not be included in the hiveconf. They should only be available/visible to the query pre-processor. In particular we do not want variables to get passed over to the cluster.
          • Add a '-hivevar' switch to the CLI for setting variables.
          • Add a "var x=y" CommandProcessor for setting variables.

          Here's an example of what we want:

          var a="I am a variable";                 -- ${a} == "I am a variable"
          set a="I am a configuration property";   -- ${hiveconf:a} == "I am a configuration property"
          set prop1 = ${a};                        -- ${hiveconf:prop1} == "I am a variable"
          set var1 = ${hiveconf:a};                -- ${var1} == "I am a configuration property"
          
          Show
          Carl Steinbach added a comment - Notes from a design discussion on ##hive: Variables, configuration properties, system properties and environment variables need to exist in separate namespaces. Variables should not be included in the hiveconf. They should only be available/visible to the query pre-processor. In particular we do not want variables to get passed over to the cluster. Add a '-hivevar' switch to the CLI for setting variables. Add a "var x=y" CommandProcessor for setting variables. Here's an example of what we want: var a= "I am a variable" ; -- ${a} == "I am a variable" set a= "I am a configuration property" ; -- ${hiveconf:a} == "I am a configuration property" set prop1 = ${a}; -- ${hiveconf:prop1} == "I am a variable" set var1 = ${hiveconf:a}; -- ${var1} == "I am a configuration property"
          John Sichi made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Carl Steinbach added a comment -

          The reviewboard jiraposter is temporarily down so I'm posting this by hand. The original comments and context are here: https://review.cloudera.org/r/229/

          • trunk/conf/hive-default.xml:
            Spelling: substituation
          • trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/VariableSubstitution.java:
            Make these variables private?
          • trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/VariableSubstitution.java:
            This should be static. No need to instantiate a VariableSubstitution object.
          • trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java:
            Since we want to do substitution for all commands it would probably make sense to do the substitution in CommandProcessorFactory.get() and make CommandProcessor an abstract class with the following implementation:
          public abstract class {
            protected String command;
          
            CommandProcessor(String command) {
              this.command = command;
            }
          
            public abstract CommandProcessorResponse run();
          
            public String getCommand() {
              return command;
            }
          }
          

          In other words, CommandProcessorFactory would return a CommandProcessor object that has been initialized with a substituted copy of the command.

          • trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java:
            Replace these string literals with constants, e.g:
          public static final String ENV_PREFIX = "env:";
          public static final String SYSTEM_PREFIX = "system:"
          public static final String HIVECONF_PREFIX = "hiveconf:"
          
          • trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java:
            String propName = varname.substring(SYSTEM_PREFIX.length());
          • trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java:
            Can we remove this special case for "silent"? In SessionState this actually maps to "hive.session.silent" and I don't see any test cases that cover this case, i.e. that call "set silent" or "set silent=x". It also seems that this introduces in inconsistency since "set silent" will show the value of "hive.session.silent", but the output of "set" will not list a value for the property "silent".

          Anyone know if there is any older code that depends on this behavior?

          Show
          Carl Steinbach added a comment - The reviewboard jiraposter is temporarily down so I'm posting this by hand. The original comments and context are here: https://review.cloudera.org/r/229/ trunk/conf/hive-default.xml: Spelling: substituation trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/VariableSubstitution.java: Make these variables private? trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/VariableSubstitution.java: This should be static. No need to instantiate a VariableSubstitution object. trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java: Since we want to do substitution for all commands it would probably make sense to do the substitution in CommandProcessorFactory.get() and make CommandProcessor an abstract class with the following implementation: public abstract class { protected String command; CommandProcessor( String command) { this .command = command; } public abstract CommandProcessorResponse run(); public String getCommand() { return command; } } In other words, CommandProcessorFactory would return a CommandProcessor object that has been initialized with a substituted copy of the command. trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java: Replace these string literals with constants, e.g: public static final String ENV_PREFIX = "env:" ; public static final String SYSTEM_PREFIX = "system:" public static final String HIVECONF_PREFIX = "hiveconf:" trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java: String propName = varname.substring(SYSTEM_PREFIX.length()); trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java: Can we remove this special case for "silent"? In SessionState this actually maps to "hive.session.silent" and I don't see any test cases that cover this case, i.e. that call "set silent" or "set silent=x". It also seems that this introduces in inconsistency since "set silent" will show the value of "hive.session.silent", but the output of "set" will not list a value for the property "silent". Anyone know if there is any older code that depends on this behavior?
          Carl Steinbach made changes -
          Fix Version/s 0.6.0 [ 12314524 ]
          Hide
          Carl Steinbach added a comment -

          @Ed: Do you have time to work on this again? If not I'd like to pick it up.

          Show
          Carl Steinbach added a comment - @Ed: Do you have time to work on this again? If not I'd like to pick it up.
          John Sichi made changes -
          Link This issue is duplicated by HIVE-1604 [ HIVE-1604 ]
          Edward Capriolo made changes -
          Attachment hive-1096-15.patch.txt [ 12460336 ]
          Hide
          Edward Capriolo added a comment -
          • trunk/conf/hive-default.xml:
            Spelling: substituation

          Fixed

          • trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/VariableSubstitution.java:
            Make these variables private?

          Private variables are what got us into the mess with hadoop. I am not going to repeat the problem.

          • trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java:
            Since we want to do substitution for all commands it would probably make sense to do the substitution in CommandProcessorFactory.get() and make CommandProcessor an abstract class with the following implementation:

          ...

          In other words, CommandProcessorFactory would return a CommandProcessor object that has been initialized with a substituted copy of the command.

          No. No more re factoring. It is working the way it is. Using factories going to be major. I'm tired. It does not prove anything since this entire process is not very clever anyway. Currently it is slightly baked, but I believe that better then being over designed.

          • trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java:
            Replace these string literals with constants, e.g:

          public static final String ENV_PREFIX = "env:";
          public static final String SYSTEM_PREFIX = "system:"
          public static final String HIVECONF_PREFIX = "hiveconf:"

          Fixed

          • trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java:
            String propName = varname.substring(SYSTEM_PREFIX.length());

          Fixed

          • trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java:
            Can we remove this special case for "silent"? In SessionState this actually maps to "hive.session.silent" and I don't see any test cases that cover this case, i.e. that call "set silent" or "set silent=x". It also seems that this introduces in inconsistency since "set silent" will show the value of "hive.session.silent", but the output of "set" will not list a value for the property "silent".

          Anyone know if there is any older code that depends on this behavior?

          Do not really know. do not really care Out of scope. It is there I am leaving it.

          As for the VAR. Turns out supporting this is not very easy. Adding Options Parsing to the CLI works, however the session state gives you no where to store "variables" except in the hive conf. SetProcessor works with SessionState not CLI SessionState. Again big re factoring is needed.

          What I did do is move remove support for set y=$

          {x}. This patch only adds set y=${hiveconf:x}. Thus if someone cares to add "VAR X" or ${x}

          or determine how to change the CLI to add this other map that can be shared across the session state this patch is not in the way.

          Thus substitution only works for $

          {hiveconf:x}

          $

          {system:x}

          and $

          {env:x}

          . implementing $

          {x}

          and var can be done in a separate issue.

          Show
          Edward Capriolo added a comment - trunk/conf/hive-default.xml: Spelling: substituation Fixed trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/VariableSubstitution.java: Make these variables private? Private variables are what got us into the mess with hadoop. I am not going to repeat the problem. trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java: Since we want to do substitution for all commands it would probably make sense to do the substitution in CommandProcessorFactory.get() and make CommandProcessor an abstract class with the following implementation: ... In other words, CommandProcessorFactory would return a CommandProcessor object that has been initialized with a substituted copy of the command. No. No more re factoring. It is working the way it is. Using factories going to be major. I'm tired. It does not prove anything since this entire process is not very clever anyway. Currently it is slightly baked, but I believe that better then being over designed. trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java: Replace these string literals with constants, e.g: public static final String ENV_PREFIX = "env:"; public static final String SYSTEM_PREFIX = "system:" public static final String HIVECONF_PREFIX = "hiveconf:" Fixed trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java: String propName = varname.substring(SYSTEM_PREFIX.length()); Fixed trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java: Can we remove this special case for "silent"? In SessionState this actually maps to "hive.session.silent" and I don't see any test cases that cover this case, i.e. that call "set silent" or "set silent=x". It also seems that this introduces in inconsistency since "set silent" will show the value of "hive.session.silent", but the output of "set" will not list a value for the property "silent". Anyone know if there is any older code that depends on this behavior? Do not really know. do not really care Out of scope. It is there I am leaving it. As for the VAR. Turns out supporting this is not very easy. Adding Options Parsing to the CLI works, however the session state gives you no where to store "variables" except in the hive conf. SetProcessor works with SessionState not CLI SessionState. Again big re factoring is needed. What I did do is move remove support for set y=$ {x}. This patch only adds set y=${hiveconf:x}. Thus if someone cares to add "VAR X" or ${x} or determine how to change the CLI to add this other map that can be shared across the session state this patch is not in the way. Thus substitution only works for $ {hiveconf:x} $ {system:x} and $ {env:x} . implementing $ {x} and var can be done in a separate issue.
          Edward Capriolo made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Edward Capriolo added a comment -

          Adds xdocs.

          Show
          Edward Capriolo added a comment - Adds xdocs.
          Edward Capriolo made changes -
          Attachment hive-1096-15.patch.txt [ 12460369 ]
          Hide
          Edward Capriolo added a comment -

          Were two 15 patches. I bumped the number to 20 to clear any confusion.

          Show
          Edward Capriolo added a comment - Were two 15 patches. I bumped the number to 20 to clear any confusion.
          Edward Capriolo made changes -
          Attachment hive-1096-20.patch.txt [ 12460370 ]
          Hide
          Namit Jain added a comment -

          +1

          Show
          Namit Jain added a comment - +1
          Hide
          Carl Steinbach added a comment -

          +1

          Namit, I can handle testing and committing this if you would like.

          Show
          Carl Steinbach added a comment - +1 Namit, I can handle testing and committing this if you would like.
          Hide
          Namit Jain added a comment -

          sure, that would be very useful

          Let me know if you run into any issues

          Show
          Namit Jain added a comment - sure, that would be very useful Let me know if you run into any issues
          Hide
          Namit Jain added a comment -

          Committed. Thanks Edward

          Show
          Namit Jain added a comment - Committed. Thanks Edward
          Namit Jain made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Namit Jain added a comment -

          @Carl, I missed the last comment - I saw that tests completed fine, and just committed it.
          Sorry about that

          Show
          Namit Jain added a comment - @Carl, I missed the last comment - I saw that tests completed fine, and just committed it. Sorry about that
          Carl Steinbach made changes -
          Link This issue supercedes HIVE-1063 [ HIVE-1063 ]
          Carl Steinbach made changes -
          Link This issue relates to HIVE-2020 [ HIVE-2020 ]
          Carl Steinbach made changes -
          Link This issue relates to HIVE-2021 [ HIVE-2021 ]
          Carl Steinbach made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Edward Capriolo
              Reporter:
              Edward Capriolo
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development