Hive
  1. Hive
  2. HIVE-6938

Add Support for Parquet Column Rename

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.14.0
    • Component/s: File Formats
    • Labels:

      Description

      Parquet was originally introduced without 'replace columns' support in ql. In addition, the default behavior for parquet is to access columns by name as opposed to by index by the Serde.

      Parquet should allow for either columnar (index based) access or name based access because it can support either.

      1. HIVE-6938.3.patch
        12 kB
        Brock Noland
      2. HIVE-6938.3.patch
        12 kB
        Daniel Weeks
      3. HIVE-6938.2.patch
        7 kB
        Brock Noland
      4. HIVE-6938.2.patch
        7 kB
        Daniel Weeks
      5. HIVE-6938.1.patch
        7 kB
        Daniel Weeks

        Issue Links

          Activity

          Hide
          Daniel Weeks added a comment -

          The patch contains a small change to DDLTask to add support for replace columns as well as a change to the Serde to allow switching between column index based access and name based access of columns.

          Show
          Daniel Weeks added a comment - The patch contains a small change to DDLTask to add support for replace columns as well as a change to the Serde to allow switching between column index based access and name based access of columns.
          Hide
          Julien Le Dem added a comment -

          I find the terminology "columnar.access" confusing but otherwise, this looks good to me.

          Show
          Julien Le Dem added a comment - I find the terminology "columnar.access" confusing but otherwise, this looks good to me.
          Hide
          Daniel Weeks added a comment -

          Confusion is understandable considering parquet is columnar. How about "column.index.access"?

          I'll update the patch.

          Show
          Daniel Weeks added a comment - Confusion is understandable considering parquet is columnar. How about "column.index.access"? I'll update the patch.
          Hide
          Julien Le Dem added a comment -

          Sounds good to me!

          Show
          Julien Le Dem added a comment - Sounds good to me!
          Hide
          Daniel Weeks added a comment -

          Patch #2 has the disambiguated property name.

          Show
          Daniel Weeks added a comment - Patch #2 has the disambiguated property name.
          Hide
          Brock Noland added a comment -

          Reuploading the exact same patch to trigger precommits.

          Show
          Brock Noland added a comment - Reuploading the exact same patch to trigger precommits.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12644047/HIVE-6938.2.patch

          ERROR: -1 due to 4 failed/errored test(s), 5504 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/172/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/172/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 4 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12644047

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12644047/HIVE-6938.2.patch ERROR: -1 due to 4 failed/errored test(s), 5504 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/172/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/172/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed This message is automatically generated. ATTACHMENT ID: 12644047
          Hide
          Brock Noland added a comment -

          Daniel Weeks looks like one of the parquet tests failed. Can you look into that?

          Show
          Brock Noland added a comment - Daniel Weeks looks like one of the parquet tests failed. Can you look into that?
          Hide
          Daniel Weeks added a comment -

          Looks like the test output didn't get included with the patch. I'm taking a look and will update.

          Show
          Daniel Weeks added a comment - Looks like the test output didn't get included with the patch. I'm taking a look and will update.
          Hide
          Daniel Weeks added a comment -

          Updated to use global switch until HIVE-6936 is resolved. This means all tables will be treated the same until input formats have access to table properties.

          Show
          Daniel Weeks added a comment - Updated to use global switch until HIVE-6936 is resolved. This means all tables will be treated the same until input formats have access to table properties.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12644714/HIVE-6938.3.patch

          ERROR: -1 due to 21 failed/errored test(s), 5525 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_java_method
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_reflect
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_math_funcs
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_script_pipe
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dml
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform1
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr1
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr2
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
          org.apache.hadoop.hive.metastore.TestMetastoreVersion.testDefaults
          org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_input4
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_input5
          org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHadoopVersion
          org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHiveVersion
          org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getPigVersion
          org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getStatus
          org.apache.hive.hcatalog.templeton.TestWebHCatE2e.invalidPath
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/212/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/212/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 21 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12644714

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12644714/HIVE-6938.3.patch ERROR: -1 due to 21 failed/errored test(s), 5525 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_java_method org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_reflect org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_math_funcs org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_script_pipe org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dml org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr2 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.metastore.TestMetastoreVersion.testDefaults org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input4 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input5 org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHadoopVersion org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHiveVersion org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getPigVersion org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getStatus org.apache.hive.hcatalog.templeton.TestWebHCatE2e.invalidPath Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/212/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/212/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 21 tests failed This message is automatically generated. ATTACHMENT ID: 12644714
          Hide
          Brock Noland added a comment -

          Very sorry for not reviewing this... I am re-uploading the patch to see the current result.

          Show
          Brock Noland added a comment - Very sorry for not reviewing this... I am re-uploading the patch to see the current result.
          Hide
          Brock Noland added a comment -

          I am +1 pending tests

          Show
          Brock Noland added a comment - I am +1 pending tests
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12650127/HIVE-6938.3.patch

          ERROR: -1 due to 4 failed/errored test(s), 5536 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
          org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/457/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/457/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-457/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 4 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12650127

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12650127/HIVE-6938.3.patch ERROR: -1 due to 4 failed/errored test(s), 5536 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/457/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/457/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-457/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed This message is automatically generated. ATTACHMENT ID: 12650127
          Hide
          Brock Noland added a comment -

          Thank you Daniel for your contribution! I have committed this to trunk.

          Show
          Brock Noland added a comment - Thank you Daniel for your contribution! I have committed this to trunk.
          Hide
          Lefty Leverenz added a comment -

          What user doc does this need?

          Show
          Lefty Leverenz added a comment - What user doc does this need? Language Manual: Parquet
          Hide
          Brock Noland added a comment -

          Yes, parquet.column.index.access needs to be documented.

          Show
          Brock Noland added a comment - Yes, parquet.column.index.access needs to be documented.
          Hide
          Szehon Ho added a comment -

          Daniel Weeks can you please look into failure of parquet_columnar? It's failing on trunk. (also failed on pre-commit test as well)

          Show
          Szehon Ho added a comment - Daniel Weeks can you please look into failure of parquet_columnar? It's failing on trunk. (also failed on pre-commit test as well)
          Hide
          Szehon Ho added a comment -

          Actually I took a quick look and it is straight-forward, test-output just needed to be regen'ed after HIVE-7087 which removes lineage-info in golden files, the rest is the same. Fix is at HIVE-7245

          Show
          Szehon Ho added a comment - Actually I took a quick look and it is straight-forward, test-output just needed to be regen'ed after HIVE-7087 which removes lineage-info in golden files, the rest is the same. Fix is at HIVE-7245
          Hide
          Daniel Weeks added a comment -

          Sure, I'll take a look.

          Show
          Daniel Weeks added a comment - Sure, I'll take a look.
          Hide
          Lefty Leverenz added a comment -

          I mentioned this in the Limitations section of the Parquet wikidoc, but it could use an example and some usage notes in a new subsection.

          Show
          Lefty Leverenz added a comment - I mentioned this in the Limitations section of the Parquet wikidoc, but it could use an example and some usage notes in a new subsection. Language Manual – Parquet – Limitations
          Hide
          Lefty Leverenz added a comment -

          Does this need additional documentation? (See previous comment.) If not, we can remove the TODOC14 label.

          Show
          Lefty Leverenz added a comment - Does this need additional documentation? (See previous comment.) If not, we can remove the TODOC14 label.
          Hide
          Thejas M Nair added a comment -

          This has been fixed in 0.14 release. Please open new jira if you see any issues.

          Show
          Thejas M Nair added a comment - This has been fixed in 0.14 release. Please open new jira if you see any issues.

            People

            • Assignee:
              Daniel Weeks
              Reporter:
              Daniel Weeks
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development