Hive
  1. Hive
  2. HIVE-7554

Parquet Hive should resolve column names in case insensitive manner

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.14.0
    • Component/s: None
    • Labels:
      None
    1. Test.thrift
      0.2 kB
      Raymond Lau
    2. part-00000.parquet.0
      2 kB
      Raymond Lau
    3. parquet_mixed_case
      2 kB
      Brock Noland
    4. HIVE-7554.patch
      4 kB
      Brock Noland
    5. HIVE-7554.4.patch
      13 kB
      Brock Noland
    6. HIVE-7554.3.patch
      14 kB
      Brock Noland
    7. HIVE-7554.3.patch
      14 kB
      Brock Noland
    8. HIVE-7554.2.patch
      7 kB
      Brock Noland

      Issue Links

        Activity

        Hide
        Brock Noland added a comment -

        Patch cleans up ws.

        Show
        Brock Noland added a comment - Patch cleans up ws.
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12658658/HIVE-7554.patch

        ERROR: -1 due to 4 failed/errored test(s), 5857 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar
        org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
        org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/116/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/116/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-116/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 4 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12658658

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658658/HIVE-7554.patch ERROR: -1 due to 4 failed/errored test(s), 5857 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/116/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/116/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-116/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed This message is automatically generated. ATTACHMENT ID: 12658658
        Hide
        Raymond Lau added a comment -

        Here's a test parquet file that should demonstrate how Hive will fail to read the columns with some upper-case column names.

        Show
        Raymond Lau added a comment - Here's a test parquet file that should demonstrate how Hive will fail to read the columns with some upper-case column names.
        Hide
        Raymond Lau added a comment -

        Sorry, this is my first time using attachments on JIRA.
        The files are: "part-00000.parquet.0" and "Test.thrift"

        Show
        Raymond Lau added a comment - Sorry, this is my first time using attachments on JIRA. The files are: "part-00000.parquet.0" and "Test.thrift"
        Hide
        Brock Noland added a comment -

        Attached patch requires that "parquet_mixed_case" file be placed in data/files/. As such, when the tests run that test will fail. I have verified it locally.

        Show
        Brock Noland added a comment - Attached patch requires that "parquet_mixed_case" file be placed in data/files/. As such, when the tests run that test will fail. I have verified it locally.
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12659240/HIVE-7554.2.patch

        ERROR: -1 due to 5 failed/errored test(s), 5863 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate
        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
        org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/143/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/143/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-143/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 5 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12659240

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12659240/HIVE-7554.2.patch ERROR: -1 due to 5 failed/errored test(s), 5863 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/143/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/143/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-143/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed This message is automatically generated. ATTACHMENT ID: 12659240
        Hide
        Brock Noland added a comment - - edited

        FYI Szehon Ho Ryan Blue who have parquet knowledge.

        Show
        Brock Noland added a comment - - edited FYI Szehon Ho Ryan Blue who have parquet knowledge.
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12659515/HIVE-7554.3.patch

        ERROR: -1 due to 10 failed/errored test(s), 5848 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine1
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_test_outer
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_gby
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rcfile_merge4
        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
        org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
        org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/151/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/151/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-151/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 10 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12659515

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12659515/HIVE-7554.3.patch ERROR: -1 due to 10 failed/errored test(s), 5848 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_test_outer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_gby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rcfile_merge4 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/151/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/151/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-151/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed This message is automatically generated. ATTACHMENT ID: 12659515
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12659524/HIVE-7554.3.patch

        ERROR: -1 due to 4 failed/errored test(s), 5848 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case
        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter
        org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/153/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/153/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-153/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 4 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12659524

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12659524/HIVE-7554.3.patch ERROR: -1 due to 4 failed/errored test(s), 5848 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/153/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/153/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-153/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed This message is automatically generated. ATTACHMENT ID: 12659524
        Hide
        Szehon Ho added a comment -

        Hi Brock, mostly looks good to me, just one minor comment/question. Ryan, feel free to take a look as well.

        Show
        Szehon Ho added a comment - Hi Brock, mostly looks good to me, just one minor comment/question. Ryan, feel free to take a look as well.
        Hide
        Szehon Ho added a comment -

        (response on the RB)

        Show
        Szehon Ho added a comment - (response on the RB)
        Hide
        Szehon Ho added a comment -

        +1

        Show
        Szehon Ho added a comment - +1
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12659702/HIVE-7554.4.patch

        ERROR: -1 due to 3 failed/errored test(s), 5850 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case
        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
        org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/167/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/167/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-167/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 3 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12659702

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12659702/HIVE-7554.4.patch ERROR: -1 due to 3 failed/errored test(s), 5850 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/167/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/167/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-167/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed This message is automatically generated. ATTACHMENT ID: 12659702
        Hide
        Szehon Ho added a comment -

        Committed to trunk. Thanks Brock for the contribution!

        Show
        Szehon Ho added a comment - Committed to trunk. Thanks Brock for the contribution!
        Hide
        Lefty Leverenz added a comment -

        Should this fix be documented in the wiki?

        For example, it could go under Version in a new subsection for 0.14.0, although you could also make a case for putting it in HiveQL Syntax or even Limitations:

        Show
        Lefty Leverenz added a comment - Should this fix be documented in the wiki? Language Manual – Parquet For example, it could go under Version in a new subsection for 0.14.0, although you could also make a case for putting it in HiveQL Syntax or even Limitations: Parquet – Version Parquet – HiveQL Syntax Parquet – Limitations
        Hide
        Szehon Ho added a comment -

        It makes sense under "limitation": parquet column names were case sensitive (user has to select column name that matches exactly what is in the metastore) until this JIRA which made it insensitive.

        I guess "version" section is a chronological display of changes, but not sure if we are going to list everything in 0.14 there? Maybe it will be a lot. Thanks.

        Show
        Szehon Ho added a comment - It makes sense under "limitation": parquet column names were case sensitive (user has to select column name that matches exactly what is in the metastore) until this JIRA which made it insensitive. I guess "version" section is a chronological display of changes, but not sure if we are going to list everything in 0.14 there? Maybe it will be a lot. Thanks.
        Hide
        Lefty Leverenz added a comment -

        Agreed, "Limitations" is the right place for this.

        "Version" has a different purpose (explaining that native support for Parquet was added in Hive 0.13.0) so its title should probably be changed to reflect that information. "HiveQL Syntax" discusses a syntax change with native support, so it could become a "Version" subsection.

        "Limitations" might need a new title too if it turns into a list of changes, or a new section could be added for that. But for now I think we can just put this issue in "Limitations" without changing any headings.

        Show
        Lefty Leverenz added a comment - Agreed, "Limitations" is the right place for this. "Version" has a different purpose (explaining that native support for Parquet was added in Hive 0.13.0) so its title should probably be changed to reflect that information. "HiveQL Syntax" discusses a syntax change with native support, so it could become a "Version" subsection. "Limitations" might need a new title too if it turns into a list of changes, or a new section could be added for that. But for now I think we can just put this issue in "Limitations" without changing any headings.
        Hide
        Szehon Ho added a comment -

        The test is failing, its missing the attached file, I've fixed this. Just curious, was it not able to be attached to patch itself because its binary?

        Show
        Szehon Ho added a comment - The test is failing, its missing the attached file, I've fixed this. Just curious, was it not able to be attached to patch itself because its binary?
        Hide
        Szehon Ho added a comment -

        OK I see binary files are unable to generate the patch, got it.

        Show
        Szehon Ho added a comment - OK I see binary files are unable to generate the patch, got it.
        Hide
        Thejas M Nair added a comment -

        This has been fixed in 0.14 release. Please open new jira if you see any issues.

        Show
        Thejas M Nair added a comment - This has been fixed in 0.14 release. Please open new jira if you see any issues.
        Hide
        Szehon Ho added a comment -

        Added to 'Limitations' section as discussed.

        Show
        Szehon Ho added a comment - Added to 'Limitations' section as discussed.

          People

          • Assignee:
            Brock Noland
            Reporter:
            Brock Noland
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development