Hive
  1. Hive
  2. HIVE-7097

The Support for REGEX Column Broken in HIVE 0.13

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not A Problem
    • Affects Version/s: 0.13.0
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None

      Description

      The Support for REGEX Column is OK in HIVE 0.12, but is broken in HIVE 0.13.
      For example:

      select `key.*` from src limit 1;
      

      will fail in HIVE 0.13 with the following error from SemanticAnalyzer:

      FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or column reference 'key.*': (possible column names are: key, value)
      

      This issue is related to HIVE-6037. When set "hive.support.quoted.identifiers=none", the issue will be gone.

      I am not sure the configuration was intended to break regex column. But at least the documentation needs to be updated: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification

      I would argue backward compatibility is more important.

        Issue Links

          Activity

          Hide
          Sumit Kumar added a comment -

          Thank you Lefty Leverenz. Marking this "Resolved/Not a problem"

          Show
          Sumit Kumar added a comment - Thank you Lefty Leverenz . Marking this "Resolved/Not a problem"
          Hide
          Lefty Leverenz added a comment -

          I added information about this to the bullet list after the SELECT syntax (same as for Create Table) and gave version information in the section "REGEX Column Specification":

          Show
          Lefty Leverenz added a comment - I added information about this to the bullet list after the SELECT syntax (same as for Create Table) and gave version information in the section "REGEX Column Specification": Select Syntax REGEX Column Specification
          Hide
          Sumit Kumar added a comment -

          Basically this doesn't seem to be an issue but it would help if we clarify this in Select documentation as well .

          Show
          Sumit Kumar added a comment - Basically this doesn't seem to be an issue but it would help if we clarify this in Select documentation as well .
          Hide
          Sumit Kumar added a comment -

          Sun Rui I hit this today and found following references useful:

          1. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterColumn
          2. https://issues.apache.org/jira/secure/attachment/12618321/QuotedIdentifier.html

          In short the functionality is still there but you need to set hive.support.quoted.identifiers to none to get the pre-0.13 behavior. I was able to run my query after

          hive> set hive.support.quoted.identifiers=none;
          

          My query was something like:

          hive> select `(col1|col2|col3)?+.+` from testTable1;
          
          Show
          Sumit Kumar added a comment - Sun Rui I hit this today and found following references useful: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterColumn https://issues.apache.org/jira/secure/attachment/12618321/QuotedIdentifier.html In short the functionality is still there but you need to set hive.support.quoted.identifiers to none to get the pre-0.13 behavior. I was able to run my query after hive> set hive.support.quoted.identifiers=none; My query was something like: hive> select `(col1|col2|col3)?+.+` from testTable1;
          Hide
          Sun Rui added a comment -

          Carter Shanklin Thanks for your explanation. I happened to use REGEX columns to shorten several queries. I can do a one-shot change to not use it, since the community have not opposed to this breaking of backward compatibility. Could you update the documentation to reflect the new behavior?

          Show
          Sun Rui added a comment - Carter Shanklin Thanks for your explanation. I happened to use REGEX columns to shorten several queries. I can do a one-shot change to not use it, since the community have not opposed to this breaking of backward compatibility. Could you update the documentation to reflect the new behavior?
          Hide
          Carter Shanklin added a comment -

          Sun,

          The issue is related to HIVE-6013. Harish and I debated about this change so you can put all the blame on me.

          We decided to go this path because:
          1. We had a user who was trying to import about 50,000 tables from existing databases that contained all kinds of strange characters in column names.
          2. The new behavior is consistent with SQL standards
          3. Most Hive users did not know about the regex feature and did not use it.
          4. Other databases allow .* as part of the column name.

          It's worthwhile for others to give their opinion on this. Personally I think the breaking change is better in the long run. Can you give more detail about why you favor the old path? Is it because of Shark compatibility? Something else?

          Show
          Carter Shanklin added a comment - Sun, The issue is related to HIVE-6013 . Harish and I debated about this change so you can put all the blame on me. We decided to go this path because: 1. We had a user who was trying to import about 50,000 tables from existing databases that contained all kinds of strange characters in column names. 2. The new behavior is consistent with SQL standards 3. Most Hive users did not know about the regex feature and did not use it. 4. Other databases allow .* as part of the column name. It's worthwhile for others to give their opinion on this. Personally I think the breaking change is better in the long run. Can you give more detail about why you favor the old path? Is it because of Shark compatibility? Something else?

            People

            • Assignee:
              Unassigned
              Reporter:
              Sun Rui
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development