Hive
  1. Hive
  2. HIVE-4331

Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.12.0
    • Component/s: HBase Handler, HCatalog
    • Labels:
      None

      Description

      1) Deprecate the HCatHBaseStorageHandler and "RevisionManager" from HCatalog. These will now continue to function but internally they will use the "DefaultStorageHandler" from Hive. They will be removed in future release of Hive.

      2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's "HBaseStorageHandler" instead of the "HiveHBaseTableOutputFormat".

      3) Write new unit tests in the HCat's "storagehandler" so that systems such as Pig and Map Reduce can use the Hive's "HBaseStorageHandler" instead of the "HCatHBaseStorageHandler".

      4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document).

      5) Replace all instances of the HCat source code, which point to "HCatStorageHandler" to use the"HiveStorageHandler" including the "FosterStorageHandler".

      I have attached the design document for the same and will attach a patch to this Jira.

      1. HIVE4331_07-17.patch
        181 kB
        Viraj Bhat
      2. HIVE-4331.1.patch
        173 kB
        Viraj Bhat
      3. HIVE-4331.2.patch
        172 kB
        Viraj Bhat
      4. HIVE-4331.patch
        172 kB
        Viraj Bhat
      5. hive4331hcatrebase.patch
        184 kB
        Viraj Bhat
      6. StorageHandlerDesign_HIVE4331.pdf
        135 kB
        Viraj Bhat

        Issue Links

          Activity

          Hide
          Ashutosh Chauhan added a comment -

          This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.

          Show
          Ashutosh Chauhan added a comment - This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.
          Hide
          Viraj Bhat added a comment -

          Thanks Ashutosh

          Show
          Viraj Bhat added a comment - Thanks Ashutosh
          Hide
          Ashutosh Chauhan added a comment -

          Both HIVE-5260 & HIVE-5261 are committed. Resolving this.

          Show
          Ashutosh Chauhan added a comment - Both HIVE-5260 & HIVE-5261 are committed. Resolving this.
          Hide
          Viraj Bhat added a comment -

          New attachment should get rid of the above error message from the PRE-COMMIT build.
          Viraj

          Show
          Viraj Bhat added a comment - New attachment should get rid of the above error message from the PRE-COMMIT build. Viraj
          Hide
          Viraj Bhat added a comment -

          Patch to allow for patch -p1 clean application

          Show
          Viraj Bhat added a comment - Patch to allow for patch -p1 clean application
          Hide
          Hive QA added a comment -

          Overall: -1 no tests executed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12602669/HIVE-4331.1.patch

          Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/701/testReport
          Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/701/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Tests failed with: NonZeroExitCodeException: Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]]
          + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + cd /data/hive-ptest/working/
          + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-701/source-prep.txt
          + mkdir -p maven ivy
          + [[ svn = \s\v\n ]]
          + [[ -n '' ]]
          + [[ -d apache-svn-trunk-source ]]
          + [[ ! -d apache-svn-trunk-source/.svn ]]
          + [[ ! -d apache-svn-trunk-source ]]
          + cd apache-svn-trunk-source
          + svn revert -R .
          ++ awk '{print $2}'
          ++ egrep -v '^X|^Performing status on external'
          ++ svn status --no-ignore
          + rm -rf build hcatalog/build hcatalog/core/build hcatalog/storage-handlers/hbase/build hcatalog/server-extensions/build hcatalog/webhcat/svr/build hcatalog/webhcat/java-client/build hcatalog/hcatalog-pig-adapter/build common/src/gen
          + svn update
          
          Fetching external item into 'hcatalog/src/test/e2e/harness'
          External at revision 1522097.
          
          At revision 1522097.
          + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
          + patchFilePath=/data/hive-ptest/working/scratch/build.patch
          + [[ -f /data/hive-ptest/working/scratch/build.patch ]]
          + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
          + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
          The patch does not appear to apply with p0 to p2
          + exit 1
          '
          

          This message is automatically generated.

          Show
          Hive QA added a comment - Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12602669/HIVE-4331.1.patch Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/701/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/701/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Tests failed with: NonZeroExitCodeException: Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-701/source-prep.txt + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf build hcatalog/build hcatalog/core/build hcatalog/storage-handlers/hbase/build hcatalog/server-extensions/build hcatalog/webhcat/svr/build hcatalog/webhcat/java-client/build hcatalog/hcatalog-pig-adapter/build common/src/gen + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1522097. At revision 1522097. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0 to p2 + exit 1 ' This message is automatically generated.
          Hide
          Viraj Bhat added a comment -

          Attaching consolidated patch for PRE-COMMIT

          Show
          Viraj Bhat added a comment - Attaching consolidated patch for PRE-COMMIT
          Hide
          Sushanth Sowmyan added a comment -

          As an update, after a bit of thinking about how backward compatibility should, I have no further issues or comments on the HCatalog side of the patch, I'm +1 on it. There is work to be done there, but not relevant directly to this patch. It belongs more into an addendum for HIVE-5233 than it does over here. I'm +1 on the HCat side (barring some minor spacing issues, which I can fix before committing)

          Show
          Sushanth Sowmyan added a comment - As an update, after a bit of thinking about how backward compatibility should, I have no further issues or comments on the HCatalog side of the patch, I'm +1 on it. There is work to be done there, but not relevant directly to this patch. It belongs more into an addendum for HIVE-5233 than it does over here. I'm +1 on the HCat side (barring some minor spacing issues, which I can fix before committing)
          Hide
          Sushanth Sowmyan added a comment -

          Yup. I do have a primary motivation in getting this patch into 0.12 because we've already gone ahead and deprecated the HBaseHCatStorageHandler under the assumption that we'll make this in. I will work with Viraj to make sure this makes it in.

          Show
          Sushanth Sowmyan added a comment - Yup. I do have a primary motivation in getting this patch into 0.12 because we've already gone ahead and deprecated the HBaseHCatStorageHandler under the assumption that we'll make this in. I will work with Viraj to make sure this makes it in.
          Hide
          Olga Natkovich added a comment -

          Hi Sushanth,

          Thanks for your feedback. Can we agree to commit the patches as is with the understanding that we will address the remaining concerns after wards. The patches are large and we already spend a lot of time adjusting andre-basing them. Given that things change pretty rapidly, we just want to make sure we do not need to rebase yet again.

          Show
          Olga Natkovich added a comment - Hi Sushanth, Thanks for your feedback. Can we agree to commit the patches as is with the understanding that we will address the remaining concerns after wards. The patches are large and we already spend a lot of time adjusting andre-basing them. Given that things change pretty rapidly, we just want to make sure we do not need to rebase yet again.
          Hide
          Sushanth Sowmyan added a comment -

          Viraj, I'm okay with the hive-side. The HCatalog side, I do have some comments to go over after the recent refactor.

          For now, I'm going to go ahead and make this bug easier to commit by splitting it up into two sub-tasks. That way, we can at least get a commit of the PTOF without further delay.

          Show
          Sushanth Sowmyan added a comment - Viraj, I'm okay with the hive-side. The HCatalog side, I do have some comments to go over after the recent refactor. For now, I'm going to go ahead and make this bug easier to commit by splitting it up into two sub-tasks. That way, we can at least get a commit of the PTOF without further delay.
          Hide
          Viraj Bhat added a comment -

          Ashutosh and Sushanth, I would really appreciate if you we could move forward with this patch and commit it to trunk and branch 0.12
          Viraj

          Show
          Viraj Bhat added a comment - Ashutosh and Sushanth, I would really appreciate if you we could move forward with this patch and commit it to trunk and branch 0.12 Viraj
          Hide
          Viraj Bhat added a comment -

          Hi Brock, thanks, I was looking for a blue build, seems that it will not happen.

          Show
          Viraj Bhat added a comment - Hi Brock, thanks, I was looking for a blue build, seems that it will not happen.
          Hide
          Brock Noland added a comment -

          It's because the HCatalog tests are flaky.

          Show
          Brock Noland added a comment - It's because the HCatalog tests are flaky.
          Hide
          Viraj Bhat added a comment -

          Hi Ashutosh and Sushanth,
          All unit tests corresponding to the Hive's HBaseCliDriver and the (new added tests and old ones) storage handler of HCat pass.
          I am not sure why the PRE-COMMIT build is showing the following 7 failure/skipped tests for HCat.
          In fact on my desktop I get no failures for HCat. Most of them are originating in the TestSequenceFileReadWrite

          TestSequenceFileReadWrite.testSequenceTableWriteRead - test case passes
          TestSequenceFileReadWrite.testTextTableWriteRead - test cases passes
          TestSequenceFileReadWrite.testTextTableWriteReadMR - test cases passes
          TestSequenceFileReadWrite.testSequenceTableWriteReadMR - test case passes

          TestHCatExternalHCatNonPartitioned.testHCatNonPartitionedTable - test cases passes on my desktop
          TestHCatStorerMulti.testStoreBasicTable - test case passes
          TestOrcDynamicPartitioned.testHCatDynamicPartitionedTable - test case passes on my desktop

          On the Hive there are 2 unit test failures, TestMinimrCliDriver.testCliDriver_smb_mapjoin_8 and TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1 which might not be related to this patch.

          Viraj

          Show
          Viraj Bhat added a comment - Hi Ashutosh and Sushanth, All unit tests corresponding to the Hive's HBaseCliDriver and the (new added tests and old ones) storage handler of HCat pass. I am not sure why the PRE-COMMIT build is showing the following 7 failure/skipped tests for HCat. In fact on my desktop I get no failures for HCat. Most of them are originating in the TestSequenceFileReadWrite TestSequenceFileReadWrite.testSequenceTableWriteRead - test case passes TestSequenceFileReadWrite.testTextTableWriteRead - test cases passes TestSequenceFileReadWrite.testTextTableWriteReadMR - test cases passes TestSequenceFileReadWrite.testSequenceTableWriteReadMR - test case passes TestHCatExternalHCatNonPartitioned.testHCatNonPartitionedTable - test cases passes on my desktop TestHCatStorerMulti.testStoreBasicTable - test case passes TestOrcDynamicPartitioned.testHCatDynamicPartitionedTable - test case passes on my desktop On the Hive there are 2 unit test failures, TestMinimrCliDriver.testCliDriver_smb_mapjoin_8 and TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1 which might not be related to this patch. Viraj
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12602102/HIVE-4331.patch

          ERROR: -1 due to 9 failed/errored test(s), 3098 tests executed
          Failed tests:

          org.apache.hive.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTable
          org.apache.hive.hcatalog.mapreduce.TestSequenceFileReadWrite.testSequenceTableWriteRead
          org.apache.hive.hcatalog.mapreduce.TestSequenceFileReadWrite.testTextTableWriteRead
          org.apache.hive.hcatalog.mapreduce.TestHCatExternalHCatNonPartitioned.testHCatNonPartitionedTable
          org.apache.hive.hcatalog.mapreduce.TestSequenceFileReadWrite.testSequenceTableWriteReadMR
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1
          org.apache.hive.hcatalog.pig.TestHCatStorerMulti.testStoreBasicTable
          org.apache.hive.hcatalog.mapreduce.TestSequenceFileReadWrite.testTextTableWriteReadMR
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_smb_mapjoin_8
          

          Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/669/testReport
          Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/669/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests failed with: TestsFailedException: 9 tests failed
          

          This message is automatically generated.

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12602102/HIVE-4331.patch ERROR: -1 due to 9 failed/errored test(s), 3098 tests executed Failed tests: org.apache.hive.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTable org.apache.hive.hcatalog.mapreduce.TestSequenceFileReadWrite.testSequenceTableWriteRead org.apache.hive.hcatalog.mapreduce.TestSequenceFileReadWrite.testTextTableWriteRead org.apache.hive.hcatalog.mapreduce.TestHCatExternalHCatNonPartitioned.testHCatNonPartitionedTable org.apache.hive.hcatalog.mapreduce.TestSequenceFileReadWrite.testSequenceTableWriteReadMR org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1 org.apache.hive.hcatalog.pig.TestHCatStorerMulti.testStoreBasicTable org.apache.hive.hcatalog.mapreduce.TestSequenceFileReadWrite.testTextTableWriteReadMR org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_smb_mapjoin_8 Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/669/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/669/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 9 tests failed This message is automatically generated.
          Hide
          Viraj Bhat added a comment -

          Rebase due to HIVE-5233 and HIVE-5236

          Show
          Viraj Bhat added a comment - Rebase due to HIVE-5233 and HIVE-5236
          Hide
          Viraj Bhat added a comment -

          Rebase due to HIVE-5233 and HIVE-5236

          Show
          Viraj Bhat added a comment - Rebase due to HIVE-5233 and HIVE-5236
          Hide
          Viraj Bhat added a comment -

          Hi Sushanth,
          Thanks for your offer for help. I went over the trunk code of HCat and modified the HCat's storage handler so that they reflect in the "org.apache.hcatalog" set. There are something else I observed in the trunk, such as:
          1) Most of the pom files have 0.12.0-SNAPSHOT, so I changed it to reflect 0.13.0-SNAPSHOT
          2) Had to modify the HCatStorageHandler.java in the "org.apache.hcatalog.mapreduce" package as it had an annotation: "* @deprecated Use/modify

          {@link org.apache.hive.hcatalog.mapreduce.HCatStorageHandler}

          instead" to get past the compile as we do not have this class in the "org.apache.hive.hcatalog"

          Let me post the new patch and wait for the pre-commit build status.
          Viraj

          Show
          Viraj Bhat added a comment - Hi Sushanth, Thanks for your offer for help. I went over the trunk code of HCat and modified the HCat's storage handler so that they reflect in the "org.apache.hcatalog" set. There are something else I observed in the trunk, such as: 1) Most of the pom files have 0.12.0-SNAPSHOT, so I changed it to reflect 0.13.0-SNAPSHOT 2) Had to modify the HCatStorageHandler.java in the "org.apache.hcatalog.mapreduce" package as it had an annotation: "* @deprecated Use/modify {@link org.apache.hive.hcatalog.mapreduce.HCatStorageHandler} instead" to get past the compile as we do not have this class in the "org.apache.hive.hcatalog" Let me post the new patch and wait for the pre-commit build status. Viraj
          Hide
          Sushanth Sowmyan added a comment -

          Ooh, ouch, yes. I see the problem.

          Your patch has two parts - the hive part and the hcatalog part. The hive part is straightforward, so I'll ignore that. The hcatalog part is more complicated, because you have two goals :

          • Make the hive hbase handler work from under hcatalog
          • Make sure that the existing hbase hcat storage handler still works from within hcat.

          HCatalog now has two sections to it:

          • org.apache.hive.hcatalog is the new package for all new work.
          • org.apache.hcatalog , which is is now deprecated, and maintained only for purposes of backward compatibility, and will be removed after about 2 more releases. What's more, the code available here is the 0.11 released version of code - i.e., for eg., HCatUtil might be different between the two versions because any patches since 0.11 release are going only to the new codebase.

          So, any changes you make that allow general hive storage handlers to work from hcatalog must happen in the new packages.

          However, the hbase hcat storage handler is part of the deprecated set, and is retained as part of the old org.apache.hcatalog.* set, and there is only one version of that. Any changes you make to that, nust be made in the old package.

          If it would help, I'm willing to come over to your workplace on Monday and help sort the changes out.

          Show
          Sushanth Sowmyan added a comment - Ooh, ouch, yes. I see the problem. Your patch has two parts - the hive part and the hcatalog part. The hive part is straightforward, so I'll ignore that. The hcatalog part is more complicated, because you have two goals : Make the hive hbase handler work from under hcatalog Make sure that the existing hbase hcat storage handler still works from within hcat. HCatalog now has two sections to it: org.apache.hive.hcatalog is the new package for all new work. org.apache.hcatalog , which is is now deprecated, and maintained only for purposes of backward compatibility, and will be removed after about 2 more releases. What's more, the code available here is the 0.11 released version of code - i.e., for eg., HCatUtil might be different between the two versions because any patches since 0.11 release are going only to the new codebase. So, any changes you make that allow general hive storage handlers to work from hcatalog must happen in the new packages. However, the hbase hcat storage handler is part of the deprecated set, and is retained as part of the old org.apache.hcatalog.* set, and there is only one version of that. Any changes you make to that, nust be made in the old package. If it would help, I'm willing to come over to your workplace on Monday and help sort the changes out.
          Hide
          Viraj Bhat added a comment -

          Sushanth I see that there are 2 directories under hcatalog trunk:
          https://github.com/apache/hive/tree/trunk/hcatalog/core/src/main/java/org/apache/hive/hcatalog
          https://github.com/apache/hive/tree/trunk/hcatalog/core/src/main/java/org/apache/hcatalog

          Should my patch apply to both these directories?

          Viraj

          Show
          Viraj Bhat added a comment - Sushanth I see that there are 2 directories under hcatalog trunk: https://github.com/apache/hive/tree/trunk/hcatalog/core/src/main/java/org/apache/hive/hcatalog https://github.com/apache/hive/tree/trunk/hcatalog/core/src/main/java/org/apache/hcatalog Should my patch apply to both these directories? Viraj
          Hide
          Viraj Bhat added a comment -

          Sushanth, it seems that: HIVE-5233 and HIVE-5236 were applied around a few hours back I created the patch. So it might be possible the patch did not apply. So I will try applying these patches on my trunk and try recreating again.
          Viraj

          Show
          Viraj Bhat added a comment - Sushanth, it seems that: HIVE-5233 and HIVE-5236 were applied around a few hours back I created the patch. So it might be possible the patch did not apply. So I will try applying these patches on my trunk and try recreating again. Viraj
          Hide
          Hive QA added a comment -

          Overall: -1 no tests executed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12601928/HIVE-4331.patch

          Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/652/testReport
          Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/652/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Tests failed with: NonZeroExitCodeException: Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]]
          + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + cd /data/hive-ptest/working/
          + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-652/source-prep.txt
          + mkdir -p maven ivy
          + [[ svn = \s\v\n ]]
          + [[ -n '' ]]
          + [[ -d apache-svn-trunk-source ]]
          + [[ ! -d apache-svn-trunk-source/.svn ]]
          + [[ ! -d apache-svn-trunk-source ]]
          + cd apache-svn-trunk-source
          + svn revert -R .
          ++ egrep -v '^X|^Performing status on external'
          ++ awk '{print $2}'
          ++ svn status --no-ignore
          + rm -rf build hcatalog/build hcatalog/core/build hcatalog/storage-handlers/hbase/build hcatalog/server-extensions/build hcatalog/webhcat/svr/build hcatalog/webhcat/java-client/build hcatalog/hcatalog-pig-adapter/build common/src/gen
          + svn update
          U    build.properties
          
          Fetching external item into 'hcatalog/src/test/e2e/harness'
          Updated external to revision 1520721.
          
          Updated to revision 1520721.
          + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
          + patchFilePath=/data/hive-ptest/working/scratch/build.patch
          + [[ -f /data/hive-ptest/working/scratch/build.patch ]]
          + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
          + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
          The patch does not appear to apply with p0 to p2
          + exit 1
          '
          

          This message is automatically generated.

          Show
          Hive QA added a comment - Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12601928/HIVE-4331.patch Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/652/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/652/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Tests failed with: NonZeroExitCodeException: Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-652/source-prep.txt + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf build hcatalog/build hcatalog/core/build hcatalog/storage-handlers/hbase/build hcatalog/server-extensions/build hcatalog/webhcat/svr/build hcatalog/webhcat/java-client/build hcatalog/hcatalog-pig-adapter/build common/src/gen + svn update U build.properties Fetching external item into 'hcatalog/src/test/e2e/harness' Updated external to revision 1520721. Updated to revision 1520721. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0 to p2 + exit 1 ' This message is automatically generated.
          Hide
          Viraj Bhat added a comment -

          Attaching a unified patch as HIVE-JIRA.patch for the precommit build to run.
          Regards
          Viraj

          Show
          Viraj Bhat added a comment - Attaching a unified patch as HIVE-JIRA.patch for the precommit build to run. Regards Viraj
          Hide
          Viraj Bhat added a comment -

          Revised after HIVE-4869

          Show
          Viraj Bhat added a comment - Revised after HIVE-4869
          Hide
          Viraj Bhat added a comment -

          Sushanth and Ashutosh updated the patch to reflect the comments. Also attaching a copy of the integrated patch on the Jira.

          Show
          Viraj Bhat added a comment - Sushanth and Ashutosh updated the patch to reflect the comments. Also attaching a copy of the integrated patch on the Jira.
          Hide
          Viraj Bhat added a comment -

          Sushanth, thanks for informing. I am now regenerating the patch and running the unit tests.

          Show
          Viraj Bhat added a comment - Sushanth, thanks for informing. I am now regenerating the patch and running the unit tests.
          Hide
          Sushanth Sowmyan added a comment -

          Viraj Bhat : The HIVE-4869 freeze is over, and you can generate the patch for the hcatalog side from trunk now.

          Show
          Sushanth Sowmyan added a comment - Viraj Bhat : The HIVE-4869 freeze is over, and you can generate the patch for the hcatalog side from trunk now.
          Hide
          Viraj Bhat added a comment -

          Sushanth, most of the issues stated in phabricator have been addressed in the patch I will reattach if HIVE-4869 goes through. Meanwhile I will wait for your go as to when HIVE-4869 gets committed.

          Show
          Viraj Bhat added a comment - Sushanth, most of the issues stated in phabricator have been addressed in the patch I will reattach if HIVE-4869 goes through. Meanwhile I will wait for your go as to when HIVE-4869 gets committed.
          Hide
          Sushanth Sowmyan added a comment -

          One quick note - the HCatalog directory will be frozen tonight, to implement the changes with HIVE-4869. Since this patch touches every single file in hcatalog (package name change), you will have to regenerate the hcatalog patch after that.

          Show
          Sushanth Sowmyan added a comment - One quick note - the HCatalog directory will be frozen tonight, to implement the changes with HIVE-4869 . Since this patch touches every single file in hcatalog (package name change), you will have to regenerate the hcatalog patch after that.
          Hide
          Viraj Bhat added a comment -

          Sushanth thanks for going through the HCat patch. I am currently implementing the changes in Hive and rebasing it.
          I will post it as soon as I am done.
          Viraj

          Show
          Viraj Bhat added a comment - Sushanth thanks for going through the HCat patch. I am currently implementing the changes in Hive and rebasing it. I will post it as soon as I am done. Viraj
          Hide
          Sushanth Sowmyan added a comment -

          I've gone through the HCat section of the patch as well, and I'm okay with it. I give that my +1 as well, pending the commit of the hive side of the patch, and making sure that checkstyle passes the changes (My guess, right now, is that with word-wrapping the Apache license headers, it won't)

          The only changes needed, on my end, are on the hive side, and are minor code-style and implementation issues, such as using raw strings, and using a map where there wasn't a need for one. If those are addressed, this patch is good to go. (Also, the whole patch will need a rebase to be applicable to trunk right now)

          Show
          Sushanth Sowmyan added a comment - I've gone through the HCat section of the patch as well, and I'm okay with it. I give that my +1 as well, pending the commit of the hive side of the patch, and making sure that checkstyle passes the changes (My guess, right now, is that with word-wrapping the Apache license headers, it won't) – The only changes needed, on my end, are on the hive side, and are minor code-style and implementation issues, such as using raw strings, and using a map where there wasn't a need for one. If those are addressed, this patch is good to go. (Also, the whole patch will need a rebase to be applicable to trunk right now)
          Hide
          Sushanth Sowmyan added a comment -

          Ashutosh Chauhan : The mapreduce->mapred transition is not a tough one by itself(apart from losing abort/commit semantics, which need to be emulated). The Outputcommitter concept is quite a bit more involved though.

          For now, I'm still +1 on this patch(the hive side) because it moves us one step closer to that goal. I don't think we should require the OC concept being brought in as well for it to be committed, I brought that up because I wanted to highlight that we aren't done with this patch, and more work is required past it, and this patch alone does not make generic OFs workable from within Hive.

          Show
          Sushanth Sowmyan added a comment - Ashutosh Chauhan : The mapreduce->mapred transition is not a tough one by itself(apart from losing abort/commit semantics, which need to be emulated). The Outputcommitter concept is quite a bit more involved though. For now, I'm still +1 on this patch(the hive side) because it moves us one step closer to that goal. I don't think we should require the OC concept being brought in as well for it to be committed, I brought that up because I wanted to highlight that we aren't done with this patch, and more work is required past it, and this patch alone does not make generic OFs workable from within Hive.
          Hide
          Francis Liu added a comment -

          No HCat and Hive dont treat OFs in same way. This difference of OF handling is a reason why HCatOF couldn't be used from Hive, another being HCat uses mapreduce api while Hive uses mapred api. If we can make Hive use HCatOF that will be a win, but thats yet another topic.

          Currently they don't mainly because of HOF but they behave in almost the same way else this whole interoperability story is broken. With this patch they'll at least be closer when it comes to dealing with OFs that don't use HOF. Instead of having to mirror that behavior.

          Actually AFAIK only the HCatOF wrapper classes uses mapreduce and the underlying stuff deals with mapred which we did as part of the StorageDriver->SerDe migration. So it'd be relatively easy to support a mapred version of HCatOF.

          Show
          Francis Liu added a comment - No HCat and Hive dont treat OFs in same way. This difference of OF handling is a reason why HCatOF couldn't be used from Hive, another being HCat uses mapreduce api while Hive uses mapred api. If we can make Hive use HCatOF that will be a win, but thats yet another topic. Currently they don't mainly because of HOF but they behave in almost the same way else this whole interoperability story is broken. With this patch they'll at least be closer when it comes to dealing with OFs that don't use HOF. Instead of having to mirror that behavior. Actually AFAIK only the HCatOF wrapper classes uses mapreduce and the underlying stuff deals with mapred which we did as part of the StorageDriver->SerDe migration. So it'd be relatively easy to support a mapred version of HCatOF.
          Hide
          Ashutosh Chauhan added a comment -

          No HCat and Hive dont treat OFs in same way. This difference of OF handling is a reason why HCatOF couldn't be used from Hive, another being HCat uses mapreduce api while Hive uses mapred api. If we can make Hive use HCatOF that will be a win, but thats yet another topic.

          Show
          Ashutosh Chauhan added a comment - No HCat and Hive dont treat OFs in same way. This difference of OF handling is a reason why HCatOF couldn't be used from Hive, another being HCat uses mapreduce api while Hive uses mapred api. If we can make Hive use HCatOF that will be a win, but thats yet another topic.
          Hide
          Francis Liu added a comment -

          For folks who has OF which has OC, it will be easier to integrate that in Hive, instead of understanding Hive innards and handling of OC. Wondering if you have given a thought on this? I just want to make sure if and when we go that route these current changes won't get in the way.

          For HCat we already do it this way. It's not really just the OC but the OF,OC,RR in general. HOF essentially is doing the Hive specific stuff that the plain OC, RR, etc can do as well. So I don't think we changed the complexity of the work needed to support new formats? Is that what you meant by get in the way?

          In the long run it'd be better since HCat and Hive treat OFs the same way. Though it'd be great to document what that contract (beyond the typical OF) is.

          Show
          Francis Liu added a comment - For folks who has OF which has OC, it will be easier to integrate that in Hive, instead of understanding Hive innards and handling of OC. Wondering if you have given a thought on this? I just want to make sure if and when we go that route these current changes won't get in the way. For HCat we already do it this way. It's not really just the OC but the OF,OC,RR in general. HOF essentially is doing the Hive specific stuff that the plain OC, RR, etc can do as well. So I don't think we changed the complexity of the work needed to support new formats? Is that what you meant by get in the way? In the long run it'd be better since HCat and Hive treat OFs the same way. Though it'd be great to document what that contract (beyond the typical OF) is.
          Hide
          Ashutosh Chauhan added a comment -

          Yeah, HIVE-3489

          Show
          Ashutosh Chauhan added a comment - Yeah, HIVE-3489
          Hide
          Francis Liu added a comment -

          Sigh there's the rub. Are there Jira's for this? Would be great to keep track of this in case someone would like to do it.

          Show
          Francis Liu added a comment - Sigh there's the rub. Are there Jira's for this? Would be great to keep track of this in case someone would like to do it.
          Hide
          Ashutosh Chauhan added a comment -

          Francis Liu One more thing popped up in my discussion with Sushanth. Another issue which is in this general area of Hive usage of OF is w.r.t OutputCommitter. Hive currently explicitly disables OC and performs lot of logic (which folks usually write in OC) from client side. Architecturally, its cleaner for Hive to migrate these things from client to OC and make OC first class citizen. For folks who has OF which has OC, it will be easier to integrate that in Hive, instead of understanding Hive innards and handling of OC. Wondering if you have given a thought on this? I just want to make sure if and when we go that route these current changes won't get in the way.

          Show
          Ashutosh Chauhan added a comment - Francis Liu One more thing popped up in my discussion with Sushanth. Another issue which is in this general area of Hive usage of OF is w.r.t OutputCommitter. Hive currently explicitly disables OC and performs lot of logic (which folks usually write in OC) from client side. Architecturally, its cleaner for Hive to migrate these things from client to OC and make OC first class citizen. For folks who has OF which has OC, it will be easier to integrate that in Hive, instead of understanding Hive innards and handling of OC. Wondering if you have given a thought on this? I just want to make sure if and when we go that route these current changes won't get in the way.
          Hide
          Sushanth Sowmyan added a comment -

          As it currently stands, SHes need , as a prerequisite, to be able to work at a partition level, and have outputcommitter concepts before it can be used more widely.

          Show
          Sushanth Sowmyan added a comment - As it currently stands, SHes need , as a prerequisite, to be able to work at a partition level, and have outputcommitter concepts before it can be used more widely.
          Hide
          Francis Liu added a comment -

          most of the usecases of traditional M/R OFs are already covered by hive, or for newer formats being developed, the OF writer winds up making changes so that it is hive compatible, such as with orc, or with the HBase SH

          Yes but ideally they don't really need HOF to do that.

          So unless there were a major push to see a BlahOutputFormat that is widely used, but was not already usable from within Hive, I don't see there being a necessity case for a change in hive that I want.

          Yep, which is why we want to do it incrementally. Letting it leak into SH and hcat code would make the idea of cleaning things up less appealing. I think if we just started using SH for new OFs and not use HOF, these pieces would naturally go into this state. Having said that it'd be nice if Orc could be moved to using storage handlers. It would also help SH mature.

          Show
          Francis Liu added a comment - most of the usecases of traditional M/R OFs are already covered by hive, or for newer formats being developed, the OF writer winds up making changes so that it is hive compatible, such as with orc, or with the HBase SH Yes but ideally they don't really need HOF to do that. So unless there were a major push to see a BlahOutputFormat that is widely used, but was not already usable from within Hive, I don't see there being a necessity case for a change in hive that I want. Yep, which is why we want to do it incrementally. Letting it leak into SH and hcat code would make the idea of cleaning things up less appealing. I think if we just started using SH for new OFs and not use HOF, these pieces would naturally go into this state. Having said that it'd be nice if Orc could be moved to using storage handlers. It would also help SH mature.
          Hide
          Sushanth Sowmyan added a comment -

          Oh, and one more thing:

          With this StorageHandlers can use generic OFs.

          This assertion is incorrect. A more precise assertion would be that with this patch, Hive can use generic OFs that do not do anything useful or necessary in their outputcommitters(i.e. do not need calls on them to be made). If an OF is designed with an outputcommitter in mind, chances are that it will need some retrofitting before it will work from within hive.

          Show
          Sushanth Sowmyan added a comment - Oh, and one more thing: With this StorageHandlers can use generic OFs. This assertion is incorrect. A more precise assertion would be that with this patch, Hive can use generic OFs that do not do anything useful or necessary in their outputcommitters(i.e. do not need calls on them to be made). If an OF is designed with an outputcommitter in mind, chances are that it will need some retrofitting before it will work from within hive.
          Hide
          Sushanth Sowmyan added a comment -

          I would love to see SH become a first class entry in hive, and HOF be a kind of SH, leading to its eventual removal. That's precisely what my long-term goal for this is.

          > and I'm not completely certain people have a need for that

          By this bit, I meant that I wasn't sure people had a need for doing away with HOF (other than for code-cleanliness, which is why I would like to see it gone) being able to use any generic OF with hive - most of the usecases of traditional M/R OFs are already covered by hive, or for newer formats being developed, the OF writer winds up making changes so that it is hive compatible, such as with orc, or with the HBase SH. So unless there were a major push to see a BlahOutputFormat that is widely used, but was not already usable from within Hive, I don't see there being a necessity case for a change in hive that I want.

          Show
          Sushanth Sowmyan added a comment - I would love to see SH become a first class entry in hive, and HOF be a kind of SH, leading to its eventual removal. That's precisely what my long-term goal for this is. > and I'm not completely certain people have a need for that By this bit, I meant that I wasn't sure people had a need for doing away with HOF (other than for code-cleanliness, which is why I would like to see it gone) being able to use any generic OF with hive - most of the usecases of traditional M/R OFs are already covered by hive, or for newer formats being developed, the OF writer winds up making changes so that it is hive compatible, such as with orc, or with the HBase SH. So unless there were a major push to see a BlahOutputFormat that is widely used, but was not already usable from within Hive, I don't see there being a necessity case for a change in hive that I want.
          Hide
          Francis Liu added a comment -

          I think that it does not go far enough along to address being able to use any generic MR OutputFormat with hive

          With this StorageHandlers can use generic OFs. The next step would be to make SH first class in Hive and migrate all suppported formats to use storage handlers. And finally remove HiveOF class.

          and I'm not completely certain people have a need for that

          Have a need for HOF you mean?

          Show
          Francis Liu added a comment - I think that it does not go far enough along to address being able to use any generic MR OutputFormat with hive With this StorageHandlers can use generic OFs. The next step would be to make SH first class in Hive and migrate all suppported formats to use storage handlers. And finally remove HiveOF class. and I'm not completely certain people have a need for that Have a need for HOF you mean?
          Hide
          Sushanth Sowmyan added a comment -

          I left some more comments on the hive patch phabricator review. Apart from a couple of minor code-style comments, the majority of my feeling about this is that I'm okay with what HivePTOF is doing, and it's a good first step, and I will +1 it, but I think that it does not go far enough along to address being able to use any generic MR OutputFormat with hive. That said, I agree that that would not really be possible unless we plumb out how HiveOutputFormat is used itself, and I'm not completely certain people have a need for that.

          Show
          Sushanth Sowmyan added a comment - I left some more comments on the hive patch phabricator review. Apart from a couple of minor code-style comments, the majority of my feeling about this is that I'm okay with what HivePTOF is doing, and it's a good first step, and I will +1 it, but I think that it does not go far enough along to address being able to use any generic MR OutputFormat with hive. That said, I agree that that would not really be possible unless we plumb out how HiveOutputFormat is used itself, and I'm not completely certain people have a need for that.
          Hide
          Olga Natkovich added a comment -

          Hi Sushanth, Were you able to review this patch? Thanks!

          Show
          Olga Natkovich added a comment - Hi Sushanth, Were you able to review this patch? Thanks!
          Hide
          Francis Liu added a comment -

          Francis, you're welcome to go ahead and review it as well. I have been looking at it as well, along with testing, although I waited on commenting till I'd finished with the hive-side review. I'll comment on both of them before Monday.

          Cool. It'd be good to have a reviewer that looks at both pieces.

          Show
          Francis Liu added a comment - Francis, you're welcome to go ahead and review it as well. I have been looking at it as well, along with testing, although I waited on commenting till I'd finished with the hive-side review. I'll comment on both of them before Monday. Cool. It'd be good to have a reviewer that looks at both pieces.
          Hide
          Viraj Bhat added a comment -

          Sushanth, thanks for your help. I will finish modifying the Hive patch till then. Viraj

          Show
          Viraj Bhat added a comment - Sushanth, thanks for your help. I will finish modifying the Hive patch till then. Viraj
          Hide
          Sushanth Sowmyan added a comment -

          Francis, you're welcome to go ahead and review it as well. I have been looking at it as well, along with testing, although I waited on commenting till I'd finished with the hive-side review. I'll comment on both of them before Monday.

          Show
          Sushanth Sowmyan added a comment - Francis, you're welcome to go ahead and review it as well. I have been looking at it as well, along with testing, although I waited on commenting till I'd finished with the hive-side review. I'll comment on both of them before Monday.
          Hide
          Viraj Bhat added a comment -

          Hi Francis,
          Thanks for your help.
          Regards
          Viraj

          Show
          Viraj Bhat added a comment - Hi Francis, Thanks for your help. Regards Viraj
          Hide
          Francis Liu added a comment -

          It seems no one has reviewed the HCat patch yet. I can go ahead and do the review if everyone is ok with it given that I worked on the design and reviewed the code internally?

          Show
          Francis Liu added a comment - It seems no one has reviewed the HCat patch yet. I can go ahead and do the review if everyone is ok with it given that I worked on the design and reviewed the code internally?
          Hide
          Viraj Bhat added a comment -

          Hi Ashutosh,
          Thanks for all your help. Let me start addressing the comments in Hive.
          Viraj

          Show
          Viraj Bhat added a comment - Hi Ashutosh, Thanks for all your help. Let me start addressing the comments in Hive. Viraj
          Hide
          Ashutosh Chauhan added a comment -

          Francis Liu I see what you are saying. I have also asked Sushanth Sowmyan to review Hive pieces to get an alternate opinion.
          Viraj Bhat I left some implementation comments on phabricator. Assuming Sushanth Sowmyan is OK with patch, those are all the comments I have.

          Show
          Ashutosh Chauhan added a comment - Francis Liu I see what you are saying. I have also asked Sushanth Sowmyan to review Hive pieces to get an alternate opinion. Viraj Bhat I left some implementation comments on phabricator. Assuming Sushanth Sowmyan is OK with patch, those are all the comments I have.
          Hide
          Francis Liu added a comment -

          How about we make HiveOFImpl implement HiveOF and than add implementation for getHiveRR and getRR in it in similar fashion in whats in proposed HivePTOF is?

          I don't think that's possible since HiveOFImpl is not used the same way as HiveOF is. HiveOFImpl is the actual OF that's passed to the job and is essentially a wrapper for the different concrete FileSinkOperators that's part of a given stage. The FileSinkOperators in turn contain a concrete instance of HiveOF. Also it would be meaningless to have HiveOFImpl implement HiveOF given that the consumer of HiveOFImpl is the MR framework which has no notion of HiveOF.

          Maybe I'm missing something here? I'd be happy to discuss this with you offline to help this patch get moving.

          Show
          Francis Liu added a comment - How about we make HiveOFImpl implement HiveOF and than add implementation for getHiveRR and getRR in it in similar fashion in whats in proposed HivePTOF is? I don't think that's possible since HiveOFImpl is not used the same way as HiveOF is. HiveOFImpl is the actual OF that's passed to the job and is essentially a wrapper for the different concrete FileSinkOperators that's part of a given stage. The FileSinkOperators in turn contain a concrete instance of HiveOF. Also it would be meaningless to have HiveOFImpl implement HiveOF given that the consumer of HiveOFImpl is the MR framework which has no notion of HiveOF. Maybe I'm missing something here? I'd be happy to discuss this with you offline to help this patch get moving.
          Hide
          Ashutosh Chauhan added a comment -

          I see what you are saying and I agree HiveOF interface is burdensome. What I am wondering is if we can morph HiveOFImpl into something which can serve this requirement, so that we don't have plethora of classes trying to wrap OFs in Hive at various point in lifecycle. How about we make HiveOFImpl implement HiveOF and than add implementation for getHiveRR and getRR in it in similar fashion in whats in proposed HivePTOF is?

          Show
          Ashutosh Chauhan added a comment - I see what you are saying and I agree HiveOF interface is burdensome. What I am wondering is if we can morph HiveOFImpl into something which can serve this requirement, so that we don't have plethora of classes trying to wrap OFs in Hive at various point in lifecycle. How about we make HiveOFImpl implement HiveOF and than add implementation for getHiveRR and getRR in it in similar fashion in whats in proposed HivePTOF is?
          Hide
          Francis Liu added a comment -

          Ashutosh, I don't think what you're suggesting will work. Since HiveOFImpl is merely a container for initializing real OFs that are contained in defined FileSinkOps. Looking at HIVE-2764, HiveOFImpl was introduced to guarantee checkOutputSpecs was called on each real OF so that OFs such as HBase has a chance of adding a delegation token. It seems the HiveOFImpl name is misleading since it doesn't extend HiveOF and is not used in the same way. Let me know if I missed anything.

          The requirement that a StorageHandler's OF has to extend HiveOF pervasive all over the hive code. ie A lot of classes have HiveOutputFormat (tableDesc, FileSinkOP, etc) as a member variable instead of OF. So it's not possible to just change this expectation at the "end" point of the code path. With storage handlers I don't see why we still need HiveOF. We should migrate all supported OFs to use storage handlers and remove HiveOF. HivePassThru is an step in that direction. StorageHandlers will no longer have to extend HiveOF and HiveOF does not have to leak into HCat code to get interoperability working.

          Show
          Francis Liu added a comment - Ashutosh, I don't think what you're suggesting will work. Since HiveOFImpl is merely a container for initializing real OFs that are contained in defined FileSinkOps. Looking at HIVE-2764 , HiveOFImpl was introduced to guarantee checkOutputSpecs was called on each real OF so that OFs such as HBase has a chance of adding a delegation token. It seems the HiveOFImpl name is misleading since it doesn't extend HiveOF and is not used in the same way. Let me know if I missed anything. The requirement that a StorageHandler's OF has to extend HiveOF pervasive all over the hive code. ie A lot of classes have HiveOutputFormat (tableDesc, FileSinkOP, etc) as a member variable instead of OF. So it's not possible to just change this expectation at the "end" point of the code path. With storage handlers I don't see why we still need HiveOF. We should migrate all supported OFs to use storage handlers and remove HiveOF. HivePassThru is an step in that direction. StorageHandlers will no longer have to extend HiveOF and HiveOF does not have to leak into HCat code to get interoperability working.
          Hide
          Ashutosh Chauhan added a comment -

          Thinking more about this, I think you can make HiveOFimpl work for this requirement which is not to require storage handler writers needing to implement HiveOF in following manner. Currently HiveOFImpl::getRecordWriter() throws Exception, instead you can do something similar what you are doing in HivePassThroughOF that is look up in the config to get underlying OF from storage handler and invoke getRR on it and return that. Make sense?

          Show
          Ashutosh Chauhan added a comment - Thinking more about this, I think you can make HiveOFimpl work for this requirement which is not to require storage handler writers needing to implement HiveOF in following manner. Currently HiveOFImpl::getRecordWriter() throws Exception, instead you can do something similar what you are doing in HivePassThroughOF that is look up in the config to get underlying OF from storage handler and invoke getRR on it and return that. Make sense?
          Hide
          Ashutosh Chauhan added a comment -

          HIVE-2764 introduced HiveOutputFormatImpl for similar requirements for HBaseStorgagehandler. I wonder if we can reuse that here instead of introducing new HivePassthroughFormat. Did you consider reusing that class (and/or enhancing it) to meet new requirements.

          Show
          Ashutosh Chauhan added a comment - HIVE-2764 introduced HiveOutputFormatImpl for similar requirements for HBaseStorgagehandler. I wonder if we can reuse that here instead of introducing new HivePassthroughFormat. Did you consider reusing that class (and/or enhancing it) to meet new requirements.
          Hide
          Viraj Bhat added a comment -

          Hi Ashutosh,
          I have created 2 review requests one which changes files in the HCatalog contrib and the other in Hive. Hope this helps in the review process.
          Hive: https://reviews.facebook.net/D12063
          HCatalog: https://reviews.facebook.net/D12069

          Viraj

          Show
          Viraj Bhat added a comment - Hi Ashutosh, I have created 2 review requests one which changes files in the HCatalog contrib and the other in Hive. Hope this helps in the review process. Hive: https://reviews.facebook.net/D12063 HCatalog: https://reviews.facebook.net/D12069 Viraj
          Hide
          Ashutosh Chauhan added a comment -

          Yeah.. it will make it easier for review. Division in terms of core Hive and HCatalog sounds good.

          Show
          Ashutosh Chauhan added a comment - Yeah.. it will make it easier for review. Division in terms of core Hive and HCatalog sounds good.
          Hide
          Viraj Bhat added a comment -

          Hi Ashutosh,
          That is right,. They do not need to write their CustomOF to implement HiveOF.
          If it makes it easier to review, I can split the patch based on HCat (contrib) and core Hive.
          Viraj

          Show
          Viraj Bhat added a comment - Hi Ashutosh, That is right,. They do not need to write their CustomOF to implement HiveOF. If it makes it easier to review, I can split the patch based on HCat (contrib) and core Hive. Viraj
          Hide
          Ashutosh Chauhan added a comment -

          If I am reading this right, is the motivation of HivePassThroughFormat is that storage handler writers need not to write their custom OF (to implement Hive OF) anymore and thus can use their existing OF unmodified and all the necessary plumbing will be done in Storage handler ?

          Show
          Ashutosh Chauhan added a comment - If I am reading this right, is the motivation of HivePassThroughFormat is that storage handler writers need not to write their custom OF (to implement Hive OF) anymore and thus can use their existing OF unmodified and all the necessary plumbing will be done in Storage handler ?
          Hide
          Ashutosh Chauhan added a comment -

          Viraj Bhat If its not too hard can you separate current patch into two issues : one dealing with HivePassThroughFormat and second about merging storage handlers. Seems like first is a pre-requisite for second. I want to understand that change little better since that may have implications for other storage handler writers and output format writers for Hive.

          Show
          Ashutosh Chauhan added a comment - Viraj Bhat If its not too hard can you separate current patch into two issues : one dealing with HivePassThroughFormat and second about merging storage handlers. Seems like first is a pre-requisite for second. I want to understand that change little better since that may have implications for other storage handler writers and output format writers for Hive.
          Hide
          Viraj Bhat added a comment -

          Hi Nick,
          Here is the link to the RB: https://reviews.facebook.net/D11721
          Viraj

          Show
          Viraj Bhat added a comment - Hi Nick, Here is the link to the RB: https://reviews.facebook.net/D11721 Viraj
          Hide
          Nick Dimiduk added a comment -

          Any chance of posting that RB link?

          Show
          Nick Dimiduk added a comment - Any chance of posting that RB link?
          Hide
          Viraj Bhat added a comment -

          Initial patch will put it on review board

          Show
          Viraj Bhat added a comment - Initial patch will put it on review board
          Hide
          Rajesh Balamohan added a comment -

          This will be extremely beneficial for lots of usecases involving Hive, HBase, HCatalog and Pig. Especially one can think of hosting frequently changed data in HBase and access it in Hive/Pig/MapReduce via HCatalog.

          Show
          Rajesh Balamohan added a comment - This will be extremely beneficial for lots of usecases involving Hive, HBase, HCatalog and Pig. Especially one can think of hosting frequently changed data in HBase and access it in Hive/Pig/MapReduce via HCatalog.

            People

            • Assignee:
              Viraj Bhat
              Reporter:
              Ashutosh Chauhan
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development