Hive
  1. Hive
  2. HIVE-7554

Parquet Hive should resolve column names in case insensitive manner

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.14.0
    • Component/s: None
    • Labels:
      None
    1. HIVE-7554.4.patch
      13 kB
      Brock Noland
    2. HIVE-7554.3.patch
      14 kB
      Brock Noland
    3. HIVE-7554.3.patch
      14 kB
      Brock Noland
    4. HIVE-7554.2.patch
      7 kB
      Brock Noland
    5. parquet_mixed_case
      2 kB
      Brock Noland
    6. Test.thrift
      0.2 kB
      Raymond Lau
    7. part-00000.parquet.0
      2 kB
      Raymond Lau
    8. HIVE-7554.patch
      4 kB
      Brock Noland

      Issue Links

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        8h 32m 1 Brock Noland 31/Jul/14 01:06
        Patch Available Patch Available Resolved Resolved
        5d 18h 52m 1 Szehon Ho 05/Aug/14 19:58
        Resolved Resolved Closed Closed
        100d 44m 1 Thejas M Nair 13/Nov/14 19:42
        Szehon Ho made changes -
        Labels TODOC14
        Hide
        Szehon Ho added a comment -

        Added to 'Limitations' section as discussed.

        Show
        Szehon Ho added a comment - Added to 'Limitations' section as discussed.
        Thejas M Nair made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Thejas M Nair added a comment -

        This has been fixed in 0.14 release. Please open new jira if you see any issues.

        Show
        Thejas M Nair added a comment - This has been fixed in 0.14 release. Please open new jira if you see any issues.
        Hide
        Szehon Ho added a comment -

        OK I see binary files are unable to generate the patch, got it.

        Show
        Szehon Ho added a comment - OK I see binary files are unable to generate the patch, got it.
        Hide
        Szehon Ho added a comment -

        The test is failing, its missing the attached file, I've fixed this. Just curious, was it not able to be attached to patch itself because its binary?

        Show
        Szehon Ho added a comment - The test is failing, its missing the attached file, I've fixed this. Just curious, was it not able to be attached to patch itself because its binary?
        Lefty Leverenz made changes -
        Labels TODOC14
        Hide
        Lefty Leverenz added a comment -

        Agreed, "Limitations" is the right place for this.

        "Version" has a different purpose (explaining that native support for Parquet was added in Hive 0.13.0) so its title should probably be changed to reflect that information. "HiveQL Syntax" discusses a syntax change with native support, so it could become a "Version" subsection.

        "Limitations" might need a new title too if it turns into a list of changes, or a new section could be added for that. But for now I think we can just put this issue in "Limitations" without changing any headings.

        Show
        Lefty Leverenz added a comment - Agreed, "Limitations" is the right place for this. "Version" has a different purpose (explaining that native support for Parquet was added in Hive 0.13.0) so its title should probably be changed to reflect that information. "HiveQL Syntax" discusses a syntax change with native support, so it could become a "Version" subsection. "Limitations" might need a new title too if it turns into a list of changes, or a new section could be added for that. But for now I think we can just put this issue in "Limitations" without changing any headings.
        Hide
        Szehon Ho added a comment -

        It makes sense under "limitation": parquet column names were case sensitive (user has to select column name that matches exactly what is in the metastore) until this JIRA which made it insensitive.

        I guess "version" section is a chronological display of changes, but not sure if we are going to list everything in 0.14 there? Maybe it will be a lot. Thanks.

        Show
        Szehon Ho added a comment - It makes sense under "limitation": parquet column names were case sensitive (user has to select column name that matches exactly what is in the metastore) until this JIRA which made it insensitive. I guess "version" section is a chronological display of changes, but not sure if we are going to list everything in 0.14 there? Maybe it will be a lot. Thanks.
        Hide
        Lefty Leverenz added a comment -

        Should this fix be documented in the wiki?

        For example, it could go under Version in a new subsection for 0.14.0, although you could also make a case for putting it in HiveQL Syntax or even Limitations:

        Show
        Lefty Leverenz added a comment - Should this fix be documented in the wiki? Language Manual – Parquet For example, it could go under Version in a new subsection for 0.14.0, although you could also make a case for putting it in HiveQL Syntax or even Limitations: Parquet – Version Parquet – HiveQL Syntax Parquet – Limitations
        Szehon Ho made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 0.14.0 [ 12326450 ]
        Resolution Fixed [ 1 ]
        Hide
        Szehon Ho added a comment -

        Committed to trunk. Thanks Brock for the contribution!

        Show
        Szehon Ho added a comment - Committed to trunk. Thanks Brock for the contribution!
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12659702/HIVE-7554.4.patch

        ERROR: -1 due to 3 failed/errored test(s), 5850 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case
        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
        org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/167/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/167/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-167/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 3 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12659702

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12659702/HIVE-7554.4.patch ERROR: -1 due to 3 failed/errored test(s), 5850 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/167/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/167/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-167/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed This message is automatically generated. ATTACHMENT ID: 12659702
        Hide
        Szehon Ho added a comment -

        +1

        Show
        Szehon Ho added a comment - +1
        Brock Noland made changes -
        Attachment HIVE-7554.4.patch [ 12659702 ]
        Hide
        Szehon Ho added a comment -

        (response on the RB)

        Show
        Szehon Ho added a comment - (response on the RB)
        Hide
        Szehon Ho added a comment -

        Hi Brock, mostly looks good to me, just one minor comment/question. Ryan, feel free to take a look as well.

        Show
        Szehon Ho added a comment - Hi Brock, mostly looks good to me, just one minor comment/question. Ryan, feel free to take a look as well.
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12659524/HIVE-7554.3.patch

        ERROR: -1 due to 4 failed/errored test(s), 5848 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case
        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter
        org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/153/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/153/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-153/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 4 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12659524

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12659524/HIVE-7554.3.patch ERROR: -1 due to 4 failed/errored test(s), 5848 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/153/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/153/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-153/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed This message is automatically generated. ATTACHMENT ID: 12659524
        Brock Noland made changes -
        Attachment HIVE-7554.3.patch [ 12659524 ]
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12659515/HIVE-7554.3.patch

        ERROR: -1 due to 10 failed/errored test(s), 5848 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine1
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_test_outer
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_gby
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rcfile_merge4
        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
        org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
        org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/151/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/151/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-151/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 10 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12659515

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12659515/HIVE-7554.3.patch ERROR: -1 due to 10 failed/errored test(s), 5848 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_test_outer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_gby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rcfile_merge4 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/151/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/151/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-151/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed This message is automatically generated. ATTACHMENT ID: 12659515
        Hide
        Brock Noland added a comment - - edited

        FYI Szehon Ho Ryan Blue who have parquet knowledge.

        Show
        Brock Noland added a comment - - edited FYI Szehon Ho Ryan Blue who have parquet knowledge.
        Brock Noland made changes -
        Remote Link This issue links to "Review Board (Web Link)" [ 16612 ]
        Brock Noland made changes -
        Attachment HIVE-7554.3.patch [ 12659515 ]
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12659240/HIVE-7554.2.patch

        ERROR: -1 due to 5 failed/errored test(s), 5863 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate
        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
        org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/143/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/143/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-143/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 5 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12659240

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12659240/HIVE-7554.2.patch ERROR: -1 due to 5 failed/errored test(s), 5863 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/143/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/143/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-143/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed This message is automatically generated. ATTACHMENT ID: 12659240
        Brock Noland made changes -
        Attachment parquet_mixed_case [ 12659239 ]
        Attachment HIVE-7554.2.patch [ 12659240 ]
        Hide
        Brock Noland added a comment -

        Attached patch requires that "parquet_mixed_case" file be placed in data/files/. As such, when the tests run that test will fail. I have verified it locally.

        Show
        Brock Noland added a comment - Attached patch requires that "parquet_mixed_case" file be placed in data/files/. As such, when the tests run that test will fail. I have verified it locally.
        Hide
        Raymond Lau added a comment -

        Sorry, this is my first time using attachments on JIRA.
        The files are: "part-00000.parquet.0" and "Test.thrift"

        Show
        Raymond Lau added a comment - Sorry, this is my first time using attachments on JIRA. The files are: "part-00000.parquet.0" and "Test.thrift"
        Raymond Lau made changes -
        Attachment part-00000.parquet.0 [ 12659027 ]
        Attachment Test.thrift [ 12659028 ]
        Hide
        Raymond Lau added a comment -

        Here's a test parquet file that should demonstrate how Hive will fail to read the columns with some upper-case column names.

        Show
        Raymond Lau added a comment - Here's a test parquet file that should demonstrate how Hive will fail to read the columns with some upper-case column names.
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12658658/HIVE-7554.patch

        ERROR: -1 due to 4 failed/errored test(s), 5857 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar
        org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
        org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/116/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/116/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-116/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 4 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12658658

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658658/HIVE-7554.patch ERROR: -1 due to 4 failed/errored test(s), 5857 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/116/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/116/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-116/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed This message is automatically generated. ATTACHMENT ID: 12658658
        Brock Noland made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Brock Noland added a comment -

        Patch cleans up ws.

        Show
        Brock Noland added a comment - Patch cleans up ws.
        Brock Noland made changes -
        Attachment HIVE-7554.patch [ 12658658 ]
        Brock Noland made changes -
        Field Original Value New Value
        Link This issue blocks PARQUET-54 [ PARQUET-54 ]
        Brock Noland created issue -

          People

          • Assignee:
            Brock Noland
            Reporter:
            Brock Noland
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development