Hive
  1. Hive
  2. HIVE-2599

Support Composit/Compound Keys with HBaseStorageHandler

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.13.0
    • Component/s: HBase Handler
    • Labels:

      Description

      It would be really nice for hive to be able to understand composite keys from an underlying HBase schema. Currently we have to store key fields twice to be able to both key and make data available. I noticed John Sichi mentioned in HIVE-1228 that this would be a separate issue but I cant find any follow up. How feasible is this in the HBaseStorageHandler?

      1. HIVE-2599.1.patch.txt
        19 kB
        Swarnim Kulkarni
      2. HIVE-2599.2.patch.txt
        19 kB
        Swarnim Kulkarni
      3. HIVE-2599.2.patch.txt
        19 kB
        Swarnim Kulkarni
      4. HIVE-2599.3.patch.txt
        19 kB
        Swarnim Kulkarni
      5. HIVE-2599.4.patch.txt
        21 kB
        Swarnim Kulkarni

        Issue Links

          Activity

          Hide
          Swarnim Kulkarni added a comment -

          If your composite keys are delimited by a separator, here is a possible way to query them in hive:

          CREATE EXTERNAL TABLE hbase_table_1(key struct<a:string,b:string,c:string>, value string) 
          ROW FORMAT DELIMITED
          COLLECTION ITEMS TERMINATED BY '~'
          STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
          WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,test-family:test-qual")
          TBLPROPERTIES ("hbase.table.name" = "SIMPLE_TABLE");
          

          Basically what this means is that the composite key here should be mapped to a struct and we specifying that the keys in the composite key are separated by a "~". After doing this, querying for the individual keys in the composite keys should be as simple as:

          select key.a,key.b,key.c from hbase_table_1;
          
          Show
          Swarnim Kulkarni added a comment - If your composite keys are delimited by a separator, here is a possible way to query them in hive: CREATE EXTERNAL TABLE hbase_table_1(key struct<a:string,b:string,c:string>, value string) ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '~' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,test-family:test-qual") TBLPROPERTIES ("hbase.table.name" = "SIMPLE_TABLE"); Basically what this means is that the composite key here should be mapped to a struct and we specifying that the keys in the composite key are separated by a "~". After doing this, querying for the individual keys in the composite keys should be as simple as: select key.a,key.b,key.c from hbase_table_1;
          Hide
          Hans Uhlig added a comment -

          Do I need to do this for all fields then? Any suggestions for binary?

          Show
          Hans Uhlig added a comment - Do I need to do this for all fields then? Any suggestions for binary?
          Hide
          Swarnim Kulkarni added a comment -

          For all parts of your key? yes. For binary, using ":key#b" in the columns mapping should work.

          Show
          Swarnim Kulkarni added a comment - For all parts of your key? yes. For binary, using ":key#b" in the columns mapping should work.
          Hide
          Hans Uhlig added a comment -

          Is this something that can work with the Avro Key/Value Schema?

          Show
          Hans Uhlig added a comment - Is this something that can work with the Avro Key/Value Schema?
          Hide
          Swarnim Kulkarni added a comment -

          The option mentioned in the first comment should work for cases where the parts of the composite key are separated by a separator.

          If this is not the case, the attached patch add a new class "HBaseCompositeKey extends LazyStruct". Consumers can choose to provide their own implementation on how exactly do they want hive to analyze their composite key. See "HBaseTestCompositeKey" for an example implementation.

          Show
          Swarnim Kulkarni added a comment - The option mentioned in the first comment should work for cases where the parts of the composite key are separated by a separator. If this is not the case, the attached patch add a new class "HBaseCompositeKey extends LazyStruct". Consumers can choose to provide their own implementation on how exactly do they want hive to analyze their composite key. See "HBaseTestCompositeKey" for an example implementation.
          Hide
          Swarnim Kulkarni added a comment -

          Rebased with master to get a clean patch. If a committer gets a chance to review this, that would be awesome! Thanks!

          Show
          Swarnim Kulkarni added a comment - Rebased with master to get a clean patch. If a committer gets a chance to review this, that would be awesome! Thanks!
          Hide
          Swarnim Kulkarni added a comment -
          Show
          Swarnim Kulkarni added a comment - Review request: https://reviews.apache.org/r/13007/
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12594616/HIVE-2599.2.patch.txt

          ERROR: -1 due to 1 failed/errored test(s), 2737 tests executed
          Failed tests:

          org.apache.hcatalog.pig.TestHCatStorer.testMultiPartColsInData
          

          Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/219/testReport
          Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/219/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests failed with: TestsFailedException: 1 tests failed
          

          This message is automatically generated.

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594616/HIVE-2599.2.patch.txt ERROR: -1 due to 1 failed/errored test(s), 2737 tests executed Failed tests: org.apache.hcatalog.pig.TestHCatStorer.testMultiPartColsInData Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/219/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/219/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed This message is automatically generated.
          Hide
          Swarnim Kulkarni added a comment -

          Looking at the test failure, it doesn't look like it is related to this. The failed test is neither related to HBase nor is using structs in its DDL. Please let me know if I am missing something and I can dig deeper.

          Show
          Swarnim Kulkarni added a comment - Looking at the test failure, it doesn't look like it is related to this. The failed test is neither related to HBase nor is using structs in its DDL. Please let me know if I am missing something and I can dig deeper.
          Hide
          Swarnim Kulkarni added a comment -

          This should be ready for review. If someone has a chance to take a look, that will be great!

          Show
          Swarnim Kulkarni added a comment - This should be ready for review. If someone has a chance to take a look, that will be great!
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12597571/HIVE-2599.2.patch.txt

          SUCCESS: +1 2850 tests passed

          Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/408/testReport
          Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/408/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12597571/HIVE-2599.2.patch.txt SUCCESS: +1 2850 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/408/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/408/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated.
          Hide
          Swarnim Kulkarni added a comment -

          This patch has been available for quite sometime and also passes all tests on Hive QA. If someone gets a chance to review this, I will really appreciate that.

          Show
          Swarnim Kulkarni added a comment - This patch has been available for quite sometime and also passes all tests on Hive QA. If someone gets a chance to review this, I will really appreciate that.
          Hide
          Nick Dimiduk added a comment -

          Hi Swarnim,

          Would you mind taking a look at HBASE-8693? Specifically, HBase now provides a Struct for defining compound keys, with support for composition using both field-width and delimiter-based separation. It would be great to get these type encoders supported as first-class citizens for bridging the gap between Hive and HBase.

          Thanks,
          Nick

          Show
          Nick Dimiduk added a comment - Hi Swarnim, Would you mind taking a look at HBASE-8693 ? Specifically, HBase now provides a Struct for defining compound keys, with support for composition using both field-width and delimiter-based separation. It would be great to get these type encoders supported as first-class citizens for bridging the gap between Hive and HBase. Thanks, Nick
          Hide
          Brock Noland added a comment -

          Hi Swarnim,

          This looks pretty good! Am I correct that the patch takes care of both selects and inserts?

          Hi Nick,

          Do you have a simple example?

          Brock

          Show
          Brock Noland added a comment - Hi Swarnim, This looks pretty good! Am I correct that the patch takes care of both selects and inserts? Hi Nick, Do you have a simple example? Brock
          Hide
          Swarnim Kulkarni added a comment -

          Am I correct that the patch takes care of both selects and inserts?

          Unfortunately no. This one would allow to querying of custom composite keys but currently doesn't support writing them back to HBase. Do you want me to include that support as a part of this patch itself or open up a separate issue for that?

          Show
          Swarnim Kulkarni added a comment - Am I correct that the patch takes care of both selects and inserts? Unfortunately no. This one would allow to querying of custom composite keys but currently doesn't support writing them back to HBase. Do you want me to include that support as a part of this patch itself or open up a separate issue for that?
          Hide
          Brock Noland added a comment -

          What happens if an insert is tried? We can address that in a follow on JIRA as long as the results of an insert aren't data corruption or a terrible error message.

          Show
          Brock Noland added a comment - What happens if an insert is tried? We can address that in a follow on JIRA as long as the results of an insert aren't data corruption or a terrible error message.
          Hide
          Nick Dimiduk added a comment -

          Do you have a simple example?

          The best documentation available is still in the unit tests. I will do a proper writeup of using this feature, it's just not a priority for me as of late.

          Show
          Nick Dimiduk added a comment - Do you have a simple example? The best documentation available is still in the unit tests . I will do a proper writeup of using this feature, it's just not a priority for me as of late.
          Hide
          Brock Noland added a comment -

          Cool, sounds good. I think we can address this in a follow on JIRA since Swarnim has a working patch here for a common use case.

          Show
          Brock Noland added a comment - Cool, sounds good. I think we can address this in a follow on JIRA since Swarnim has a working patch here for a common use case.
          Hide
          Hive QA added a comment -

          Overall: -1 no tests executed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12597571/HIVE-2599.2.patch.txt

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/800/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/800/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Tests exited with: NonZeroExitCodeException
          Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]]
          + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
          + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
          + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + cd /data/hive-ptest/working/
          + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-800/source-prep.txt
          + [[ false == \t\r\u\e ]]
          + mkdir -p maven ivy
          + [[ svn = \s\v\n ]]
          + [[ -n '' ]]
          + [[ -d apache-svn-trunk-source ]]
          + [[ ! -d apache-svn-trunk-source/.svn ]]
          + [[ ! -d apache-svn-trunk-source ]]
          + cd apache-svn-trunk-source
          + svn revert -R .
          Reverted 'ql/src/test/results/clientnegative/exchange_partition_neg_partition_exists2.q.out'
          Reverted 'ql/src/test/results/clientnegative/exchange_partition_neg_partition_exists.q.out'
          Reverted 'ql/src/test/results/clientnegative/exchange_partition_neg_partition_exists3.q.out'
          Reverted 'ql/src/test/results/clientnegative/exchange_partition_neg_incomplete_partition.q.out'
          Reverted 'ql/src/test/results/clientpositive/exchange_partition3.q.out'
          Reverted 'ql/src/test/results/clientpositive/exchange_partition.q.out'
          Reverted 'ql/src/test/results/clientpositive/exchange_partition2.q.out'
          Reverted 'ql/src/test/queries/clientnegative/exchange_partition_neg_incomplete_partition.q'
          Reverted 'ql/src/test/queries/clientnegative/exchange_partition_neg_partition_exists2.q'
          Reverted 'ql/src/test/queries/clientnegative/exchange_partition_neg_partition_exists3.q'
          Reverted 'ql/src/test/queries/clientnegative/exchange_partition_neg_partition_missing.q'
          Reverted 'ql/src/test/queries/clientnegative/exchange_partition_neg_partition_exists.q'
          Reverted 'ql/src/test/queries/clientpositive/exchange_partition.q'
          Reverted 'ql/src/test/queries/clientpositive/exchange_partition2.q'
          Reverted 'ql/src/test/queries/clientpositive/exchange_partition3.q'
          Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java'
          ++ egrep -v '^X|^Performing status on external'
          ++ awk '{print $2}'
          ++ svn status --no-ignore
          + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
          + svn update
          
          Fetching external item into 'hcatalog/src/test/e2e/harness'
          External at revision 1555274.
          
          At revision 1555274.
          + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
          + patchFilePath=/data/hive-ptest/working/scratch/build.patch
          + [[ -f /data/hive-ptest/working/scratch/build.patch ]]
          + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
          + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
          The patch does not appear to apply with p0, p1, or p2
          + exit 1
          '
          

          This message is automatically generated.

          ATTACHMENT ID: 12597571

          Show
          Hive QA added a comment - Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12597571/HIVE-2599.2.patch.txt Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/800/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/800/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-800/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'ql/src/test/results/clientnegative/exchange_partition_neg_partition_exists2.q.out' Reverted 'ql/src/test/results/clientnegative/exchange_partition_neg_partition_exists.q.out' Reverted 'ql/src/test/results/clientnegative/exchange_partition_neg_partition_exists3.q.out' Reverted 'ql/src/test/results/clientnegative/exchange_partition_neg_incomplete_partition.q.out' Reverted 'ql/src/test/results/clientpositive/exchange_partition3.q.out' Reverted 'ql/src/test/results/clientpositive/exchange_partition.q.out' Reverted 'ql/src/test/results/clientpositive/exchange_partition2.q.out' Reverted 'ql/src/test/queries/clientnegative/exchange_partition_neg_incomplete_partition.q' Reverted 'ql/src/test/queries/clientnegative/exchange_partition_neg_partition_exists2.q' Reverted 'ql/src/test/queries/clientnegative/exchange_partition_neg_partition_exists3.q' Reverted 'ql/src/test/queries/clientnegative/exchange_partition_neg_partition_missing.q' Reverted 'ql/src/test/queries/clientnegative/exchange_partition_neg_partition_exists.q' Reverted 'ql/src/test/queries/clientpositive/exchange_partition.q' Reverted 'ql/src/test/queries/clientpositive/exchange_partition2.q' Reverted 'ql/src/test/queries/clientpositive/exchange_partition3.q' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1555274. At revision 1555274. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' This message is automatically generated. ATTACHMENT ID: 12597571
          Hide
          Swarnim Kulkarni added a comment -

          Attached is the latest patch rebased with the master. The patch should apply cleanly now.

          Brock Noland On your question about inserts, I think I might have misunderstood you a little bit. I ran the following queries to test inserts on composite keys and was able to do it successfully.

          CREATE EXTERNAL TABLE test_table_1(key struct<personId:string,value:string>, data string) 
          ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
          STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
          WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:value) 
          TBLPROPERTIES ("hbase.table.name" = "hbase_test_table_1","hbase.composite.key.class"="com.test.hive.TestHBaseCompositeKey")
          
          select * from test_table_1;
          
          {"personid":"person1","value":"value1"}	1385435417948
          {"personid":"person2","value":"value2"}	1386691798261
          {"personid":"person3","value":"value3"}	1387481795304
          {"personid":"person4","value":"value4"}	1386705359123
          {"personid":"person5","value":"value5"}	1386972894836
          ......
          
          CREATE TABLE test_table_2(key struct<personId:string,value:string>, value string) 
          ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe'
          STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
          WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
          TBLPROPERTIES ("hbase.table.name" = "hbase_test_table_2");
          
          INSERT OVERWRITE TABLE test_table_2 select key,data from test_table_1;
          14/01/04 00:32:33 INFO ql.Driver: Launching Job 1 out of 1
          14/01/04 00:32:58 INFO exec.Task: 2014-01-04 00:32:58,720 Stage-0 map = 0%,  reduce = 0%
          ....
          2014-01-04 00:33:29,930 Stage-0 map = 100%,  reduce = 100%, Cumulative CPU 5.48 sec
          
          select * from test_table_2;
          
          {"personid":"person1","value":"value1"}	1385435417948
          {"personid":"person2","value":"value2"}	1386691798261
          {"personid":"person3","value":"value3"}	1387481795304
          {"personid":"person4","value":"value4"}	1386705359123
          {"personid":"person5","value":"value5"}	1386972894836
          ......
          

          If this is what you meant, then yes the patch will handle both select and inserts. If not, then please let me know so that I will log a new bug and tackle it accordingly.

          Show
          Swarnim Kulkarni added a comment - Attached is the latest patch rebased with the master. The patch should apply cleanly now. Brock Noland On your question about inserts, I think I might have misunderstood you a little bit. I ran the following queries to test inserts on composite keys and was able to do it successfully. CREATE EXTERNAL TABLE test_table_1(key struct<personId:string,value:string>, data string) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:value) TBLPROPERTIES ("hbase.table.name" = "hbase_test_table_1","hbase.composite.key.class"="com.test.hive.TestHBaseCompositeKey") select * from test_table_1; {"personid":"person1","value":"value1"} 1385435417948 {"personid":"person2","value":"value2"} 1386691798261 {"personid":"person3","value":"value3"} 1387481795304 {"personid":"person4","value":"value4"} 1386705359123 {"personid":"person5","value":"value5"} 1386972894836 ...... CREATE TABLE test_table_2(key struct<personId:string,value:string>, value string) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "hbase_test_table_2"); INSERT OVERWRITE TABLE test_table_2 select key,data from test_table_1; 14/01/04 00:32:33 INFO ql.Driver: Launching Job 1 out of 1 14/01/04 00:32:58 INFO exec.Task: 2014-01-04 00:32:58,720 Stage-0 map = 0%, reduce = 0% .... 2014-01-04 00:33:29,930 Stage-0 map = 100%, reduce = 100%, Cumulative CPU 5.48 sec select * from test_table_2; {"personid":"person1","value":"value1"} 1385435417948 {"personid":"person2","value":"value2"} 1386691798261 {"personid":"person3","value":"value3"} 1387481795304 {"personid":"person4","value":"value4"} 1386705359123 {"personid":"person5","value":"value5"} 1386972894836 ...... If this is what you meant, then yes the patch will handle both select and inserts. If not, then please let me know so that I will log a new bug and tackle it accordingly.
          Hide
          Swarnim Kulkarni added a comment -

          As a side note, I also ran into HIVE-4515 and HIVE-5680 while testing the master. Tackling those next...

          Show
          Swarnim Kulkarni added a comment - As a side note, I also ran into HIVE-4515 and HIVE-5680 while testing the master. Tackling those next...
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12621458/HIVE-2599.3.patch.txt

          SUCCESS: +1 4877 tests passed

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/806/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/806/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          ATTACHMENT ID: 12621458

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12621458/HIVE-2599.3.patch.txt SUCCESS: +1 4877 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/806/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/806/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. ATTACHMENT ID: 12621458
          Hide
          Brock Noland added a comment -

          Great to hear, it sounds like the composite key is working as desired – both inserts and selects work as expected!

          The patch looks good to me as well! The only change required I noticed is that the two new classes related to the composite keys require apache license headers. I can commit this after that change!

          Show
          Brock Noland added a comment - Great to hear, it sounds like the composite key is working as desired – both inserts and selects work as expected! The patch looks good to me as well! The only change required I noticed is that the two new classes related to the composite keys require apache license headers. I can commit this after that change!
          Hide
          Nick Dimiduk added a comment -

          After a cursory review, patch v3 looks good to me also. It should be trivial to extend these concepts to provide a LazyStructObjectInspector over the Strict/StructIterator types in HBase. Let me see about finding time to do so this week.

          Show
          Nick Dimiduk added a comment - After a cursory review, patch v3 looks good to me also. It should be trivial to extend these concepts to provide a LazyStructObjectInspector over the Strict/StructIterator types in HBase. Let me see about finding time to do so this week.
          Hide
          Swarnim Kulkarni added a comment -

          Updated patch with apache license headers.

          Show
          Swarnim Kulkarni added a comment - Updated patch with apache license headers.
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12621637/HIVE-2599.4.patch.txt

          SUCCESS: +1 4877 tests passed

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/812/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/812/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          ATTACHMENT ID: 12621637

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12621637/HIVE-2599.4.patch.txt SUCCESS: +1 4877 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/812/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/812/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. ATTACHMENT ID: 12621637
          Hide
          Brock Noland added a comment -

          +1

          Show
          Brock Noland added a comment - +1
          Hide
          Brock Noland added a comment -

          I created "HIVE-6150 - Take advantage of Native HBase Compound keys" for the changes to take advantage of HBASE-8693.

          Show
          Brock Noland added a comment - I created " HIVE-6150 - Take advantage of Native HBase Compound keys" for the changes to take advantage of HBASE-8693 .
          Hide
          Brock Noland added a comment -

          I committed this to trunk! Thank you so much Swarnim for your contribution!

          Show
          Brock Noland added a comment - I committed this to trunk! Thank you so much Swarnim for your contribution!
          Hide
          Navis added a comment -

          Swarnim Kulkarni Brock Noland I'm really late on this issue. Could I ask why LazySimpleStructObjectInspector started to make new List for getStructFieldsDataAsList()?

          Show
          Navis added a comment - Swarnim Kulkarni Brock Noland I'm really late on this issue. Could I ask why LazySimpleStructObjectInspector started to make new List for getStructFieldsDataAsList()?
          Hide
          Navis added a comment -

          I cannot understand the intention of this issue. Without this patch,

          hive> CREATE TABLE hbase_struct(key struct<col1:int,col2:int>, value string)
              > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
              > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:string");
          ...
          hive> insert into table hbase_struct select struct(1000,2000),'value' from src tablesample (1 rows);
          ...
          hive> select * from hbase_struct;      
          {"col1":1000,"col2":2000}	value
          hive> select key.col1,key.col2 from hbase_struct;
          1000	2000
          

          I can do anything with default LazyStruct and now it's making new Struct and using it. Is it for controlling wire-format on hbase rathe than default json serialization?

          Show
          Navis added a comment - I cannot understand the intention of this issue. Without this patch, hive> CREATE TABLE hbase_struct(key struct<col1:int,col2:int>, value string) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:string"); ... hive> insert into table hbase_struct select struct(1000,2000),'value' from src tablesample (1 rows); ... hive> select * from hbase_struct; {"col1":1000,"col2":2000} value hive> select key.col1,key.col2 from hbase_struct; 1000 2000 I can do anything with default LazyStruct and now it's making new Struct and using it. Is it for controlling wire-format on hbase rathe than default json serialization?
          Hide
          Swarnim Kulkarni added a comment -

          Could I ask why LazySimpleStructObjectInspector started to make new List for getStructFieldsDataAsList()?

          Basically a bit of testing convenience on my side. Also tried to make the implementation here similar to some of the existing ObjectInspectors[1]. It tried my best to ensure that this passed all existing tests and also wasn't violating any obvious contracts on the class. Did I miss something?

          I cannot understand the intention of this issue.

          You are correct in the sense that a basic struct doesn't need a custom HBaseCompositeKey implementation(also refer to my first comment on this JIRA for another example). But there are increasing number of cases where custom and more complicated serializers are written to serialize the keys in a more efficient way (salting is such an example). For such serializers, a custom HBaseCompositeKey implementation would help hive understand such complicated keys.

          [1] https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ReflectionStructObjectInspector.java#L174
          [2] https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/UnionStructObjectInspector.java#L153

          Show
          Swarnim Kulkarni added a comment - Could I ask why LazySimpleStructObjectInspector started to make new List for getStructFieldsDataAsList()? Basically a bit of testing convenience on my side. Also tried to make the implementation here similar to some of the existing ObjectInspectors [1] . It tried my best to ensure that this passed all existing tests and also wasn't violating any obvious contracts on the class. Did I miss something? I cannot understand the intention of this issue. You are correct in the sense that a basic struct doesn't need a custom HBaseCompositeKey implementation(also refer to my first comment on this JIRA for another example). But there are increasing number of cases where custom and more complicated serializers are written to serialize the keys in a more efficient way (salting is such an example). For such serializers, a custom HBaseCompositeKey implementation would help hive understand such complicated keys. [1] https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ReflectionStructObjectInspector.java#L174 [2] https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/UnionStructObjectInspector.java#L153
          Hide
          Brock Noland added a comment -

          But there are increasing number of cases where custom and more complicated serializers are written to serialize the keys in a more efficient way (salting is such an example)

          I've seen this as well. That is where a simple struct cannot deserialize a key.

          Show
          Brock Noland added a comment - But there are increasing number of cases where custom and more complicated serializers are written to serialize the keys in a more efficient way (salting is such an example) I've seen this as well. That is where a simple struct cannot deserialize a key.

            People

            • Assignee:
              Swarnim Kulkarni
              Reporter:
              Hans Uhlig
            • Votes:
              1 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development