Hive
  1. Hive
  2. HIVE-4765

Improve HBase bulk loading facility

    Details

    • Type: Improvement Improvement
    • Status: Patch Available
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: HBase Handler
    • Labels:
      None

      Description

      With some patches, bulk loading process for HBase could be simplified a lot.

      CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING)
      ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe'
      WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:key,cf2:value")
      STORED AS
        INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
        OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter'
      LOCATION '/tmp/export';
      
      SET mapred.reduce.tasks=4;
      set hive.optimize.sampling.orderby=true;
      
      INSERT OVERWRITE TABLE hbase_export
      SELECT * from (SELECT union_kv(key,key,value,":key,cf1:key,cf2:value") as (rowkey,union) FROM src) A ORDER BY rowkey,union;
      
      hive> !hadoop fs -lsr /tmp/export;                                                                                          
      drwxr-xr-x   - navis supergroup          0 2013-06-20 11:05 /tmp/export/cf1
      -rw-r--r--   1 navis supergroup       4317 2013-06-20 11:05 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835
      -rw-r--r--   1 navis supergroup       5868 2013-06-20 11:05 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d
      -rw-r--r--   1 navis supergroup       5214 2013-06-20 11:05 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8
      -rw-r--r--   1 navis supergroup       4290 2013-06-20 11:05 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10
      drwxr-xr-x   - navis supergroup          0 2013-06-20 11:05 /tmp/export/cf2
      -rw-r--r--   1 navis supergroup       6744 2013-06-20 11:05 /tmp/export/cf2/409673b517d94e16920e445d07710f52
      -rw-r--r--   1 navis supergroup       4975 2013-06-20 11:05 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e
      -rw-r--r--   1 navis supergroup       6096 2013-06-20 11:05 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4
      -rw-r--r--   1 navis supergroup       4890 2013-06-20 11:05 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba
      
      hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test
      
      1. HIVE-4765.D11463.1.patch
        44 kB
        Phabricator
      2. HIVE-4765.2.patch.txt
        51 kB
        Navis
      3. HIVE-4765.3.patch.txt
        46 kB
        Navis

        Issue Links

          Activity

          Hide
          Phabricator added a comment -

          navis requested code review of "HIVE-4765 [jira] Improve HBase bulk loading facility".

          Reviewers: JIRA

          HIVE-4765 Improve HBase bulk loading facility

          With some patches, bulk loading process for HBase could be simplified a lot.

          CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING)
          ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe'
          WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:key,cf2:value")
          STORED AS
          INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
          OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter'
          LOCATION '/tmp/export';

          SET mapred.reduce.tasks=4;
          set hive.optimize.sampling.orderby=true;

          INSERT OVERWRITE TABLE hbase_export
          SELECT * from (SELECT union_kv(key,key,value,":key,cf1:key,cf2:value") as (rowkey,union) FROM src) A ORDER BY rowkey,union;

          hive> !hadoop fs -lsr /tmp/export;
          drwxr-xr-x - navis supergroup 0 2013-06-20 11:05 /tmp/export/cf1
          rw-rr- 1 navis supergroup 4317 2013-06-20 11:05 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835
          rw-rr- 1 navis supergroup 5868 2013-06-20 11:05 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d
          rw-rr- 1 navis supergroup 5214 2013-06-20 11:05 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8
          rw-rr- 1 navis supergroup 4290 2013-06-20 11:05 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10
          drwxr-xr-x - navis supergroup 0 2013-06-20 11:05 /tmp/export/cf2
          rw-rr- 1 navis supergroup 6744 2013-06-20 11:05 /tmp/export/cf2/409673b517d94e16920e445d07710f52
          rw-rr- 1 navis supergroup 4975 2013-06-20 11:05 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e
          rw-rr- 1 navis supergroup 6096 2013-06-20 11:05 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4
          rw-rr- 1 navis supergroup 4890 2013-06-20 11:05 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba

          hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test

          TEST PLAN
          EMPTY

          REVISION DETAIL
          https://reviews.facebook.net/D11463

          AFFECTED FILES
          hbase-handler/build.xml
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseExportSerDe.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileExporter.java
          hbase-handler/src/test/queries/positive/hbase_bulk2.m
          hbase-handler/src/test/results/positive/hbase_bulk2.m.out
          ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
          ql/src/java/org/apache/hadoop/hive/ql/io/HiveOutputCommitter.java
          ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
          ql/src/java/org/apache/hadoop/hive/ql/udf/generic/HFileKeyValue.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUnion.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUnionObjectInspector.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/objectinspector/LazyBinaryObjectInspectorFactory.java
          serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java

          MANAGE HERALD RULES
          https://reviews.facebook.net/herald/view/differential/

          WHY DID I GET THIS EMAIL?
          https://reviews.facebook.net/herald/transcript/27159/

          To: JIRA, navis

          Show
          Phabricator added a comment - navis requested code review of " HIVE-4765 [jira] Improve HBase bulk loading facility". Reviewers: JIRA HIVE-4765 Improve HBase bulk loading facility With some patches, bulk loading process for HBase could be simplified a lot. CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:key,cf2:value") STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter' LOCATION '/tmp/export'; SET mapred.reduce.tasks=4; set hive.optimize.sampling.orderby=true; INSERT OVERWRITE TABLE hbase_export SELECT * from (SELECT union_kv(key,key,value,":key,cf1:key,cf2:value") as (rowkey,union) FROM src) A ORDER BY rowkey,union; hive> !hadoop fs -lsr /tmp/export; drwxr-xr-x - navis supergroup 0 2013-06-20 11:05 /tmp/export/cf1 rw-r r - 1 navis supergroup 4317 2013-06-20 11:05 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835 rw-r r - 1 navis supergroup 5868 2013-06-20 11:05 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d rw-r r - 1 navis supergroup 5214 2013-06-20 11:05 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8 rw-r r - 1 navis supergroup 4290 2013-06-20 11:05 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10 drwxr-xr-x - navis supergroup 0 2013-06-20 11:05 /tmp/export/cf2 rw-r r - 1 navis supergroup 6744 2013-06-20 11:05 /tmp/export/cf2/409673b517d94e16920e445d07710f52 rw-r r - 1 navis supergroup 4975 2013-06-20 11:05 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e rw-r r - 1 navis supergroup 6096 2013-06-20 11:05 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4 rw-r r - 1 navis supergroup 4890 2013-06-20 11:05 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D11463 AFFECTED FILES hbase-handler/build.xml hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseExportSerDe.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileExporter.java hbase-handler/src/test/queries/positive/hbase_bulk2.m hbase-handler/src/test/results/positive/hbase_bulk2.m.out ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java ql/src/java/org/apache/hadoop/hive/ql/io/HiveOutputCommitter.java ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/HFileKeyValue.java serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUnion.java serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUnionObjectInspector.java serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/objectinspector/LazyBinaryObjectInspectorFactory.java serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/27159/ To: JIRA, navis
          Hide
          Hive QA added a comment -

          Overall: -1 no tests executed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12655359/HIVE-4765.2.patch.txt

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/755/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/755/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-755/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Tests exited with: NonZeroExitCodeException
          Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
          + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
          + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
          + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
          + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
          + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
          + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
          + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + cd /data/hive-ptest/working/
          + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-755/source-prep.txt
          + [[ false == \t\r\u\e ]]
          + mkdir -p maven ivy
          + [[ svn = \s\v\n ]]
          + [[ -n '' ]]
          + [[ -d apache-svn-trunk-source ]]
          + [[ ! -d apache-svn-trunk-source/.svn ]]
          + [[ ! -d apache-svn-trunk-source ]]
          + cd apache-svn-trunk-source
          + svn revert -R .
          Reverted 'hcatalog/core/src/main/java/org/apache/hive/hcatalog/cli/SemanticAnalysis/CreateTableHook.java'
          Reverted 'common/src/java/org/apache/hadoop/hive/conf/HiveConf.java'
          Reverted 'ql/src/test/results/clientnegative/genericFileFormat.q.out'
          Reverted 'ql/src/test/results/clientnegative/fileformat_bad_class.q.out'
          Reverted 'ql/src/test/results/clientpositive/parallel_orderby.q.out'
          Reverted 'ql/src/test/results/clientpositive/union25.q.out'
          Reverted 'ql/src/test/results/clientpositive/smb_mapjoin9.q.out'
          Reverted 'ql/src/test/results/clientpositive/inputddl1.q.out'
          Reverted 'ql/src/test/results/clientpositive/tez/tez_dml.q.out'
          Reverted 'ql/src/test/results/clientpositive/tez/ctas.q.out'
          Reverted 'ql/src/test/results/clientpositive/create_union_table.q.out'
          Reverted 'ql/src/test/results/clientpositive/merge3.q.out'
          Reverted 'ql/src/test/results/clientpositive/ctas_uses_database_location.q.out'
          Reverted 'ql/src/test/results/clientpositive/nullformat.q.out'
          Reverted 'ql/src/test/results/clientpositive/inputddl3.q.out'
          Reverted 'ql/src/test/results/clientpositive/skewjoin_noskew.q.out'
          Reverted 'ql/src/test/results/clientpositive/input15.q.out'
          Reverted 'ql/src/test/results/clientpositive/nonmr_fetch.q.out'
          Reverted 'ql/src/test/results/clientpositive/nullformatCTAS.q.out'
          Reverted 'ql/src/test/results/clientpositive/groupby_duplicate_key.q.out'
          Reverted 'ql/src/test/results/clientpositive/ctas.q.out'
          Reverted 'ql/src/test/results/clientpositive/union_top_level.q.out'
          Reverted 'ql/src/test/results/clientpositive/inputddl2.q.out'
          Reverted 'ql/src/test/results/clientpositive/temp_table.q.out'
          Reverted 'ql/src/test/results/clientpositive/ctas_colname.q.out'
          Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g'
          Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java'
          Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g'
          Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g'
          Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java'
          Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java'
          Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java'
          Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/IOConstants.java'
          ++ awk '{print $2}'
          ++ egrep -v '^X|^Performing status on external'
          ++ svn status --no-ignore
          + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/custom-serde/src/main/resources itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomTextSerDe.java itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomTextStorageFormatDescriptor.java itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target ql/src/test/results/clientpositive/storage_format_descriptor.q.out ql/src/test/org/apache/hadoop/hive/ql/io/TestStorageFormatDescriptor.java ql/src/test/queries/clientpositive/storage_format_descriptor.q ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/StorageFormatFactory.java ql/src/java/org/apache/hadoop/hive/ql/io/ParquetFileStorageFormatDescriptor.java ql/src/java/org/apache/hadoop/hive/ql/io/AbstractStorageFormatDescriptor.java ql/src/java/org/apache/hadoop/hive/ql/io/RCFileStorageFormatDescriptor.java ql/src/java/org/apache/hadoop/hive/ql/io/ORCFileStorageFormatDescriptor.java ql/src/java/org/apache/hadoop/hive/ql/io/TextFileStorageFormatDescriptor.java ql/src/java/org/apache/hadoop/hive/ql/io/SequenceFileStorageFormatDescriptor.java ql/src/java/org/apache/hadoop/hive/ql/io/StorageFormatDescriptor.java ql/src/main/resources/META-INF
          + svn update
          
          Fetching external item into 'hcatalog/src/test/e2e/harness'
          External at revision 1609903.
          
          At revision 1609903.
          + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
          + patchFilePath=/data/hive-ptest/working/scratch/build.patch
          + [[ -f /data/hive-ptest/working/scratch/build.patch ]]
          + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
          + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
          The patch does not appear to apply with p0, p1, or p2
          + exit 1
          '
          

          This message is automatically generated.

          ATTACHMENT ID: 12655359

          Show
          Hive QA added a comment - Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12655359/HIVE-4765.2.patch.txt Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/755/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/755/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-755/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-755/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'hcatalog/core/src/main/java/org/apache/hive/hcatalog/cli/SemanticAnalysis/CreateTableHook.java' Reverted 'common/src/java/org/apache/hadoop/hive/conf/HiveConf.java' Reverted 'ql/src/test/results/clientnegative/genericFileFormat.q.out' Reverted 'ql/src/test/results/clientnegative/fileformat_bad_class.q.out' Reverted 'ql/src/test/results/clientpositive/parallel_orderby.q.out' Reverted 'ql/src/test/results/clientpositive/union25.q.out' Reverted 'ql/src/test/results/clientpositive/smb_mapjoin9.q.out' Reverted 'ql/src/test/results/clientpositive/inputddl1.q.out' Reverted 'ql/src/test/results/clientpositive/tez/tez_dml.q.out' Reverted 'ql/src/test/results/clientpositive/tez/ctas.q.out' Reverted 'ql/src/test/results/clientpositive/create_union_table.q.out' Reverted 'ql/src/test/results/clientpositive/merge3.q.out' Reverted 'ql/src/test/results/clientpositive/ctas_uses_database_location.q.out' Reverted 'ql/src/test/results/clientpositive/nullformat.q.out' Reverted 'ql/src/test/results/clientpositive/inputddl3.q.out' Reverted 'ql/src/test/results/clientpositive/skewjoin_noskew.q.out' Reverted 'ql/src/test/results/clientpositive/input15.q.out' Reverted 'ql/src/test/results/clientpositive/nonmr_fetch.q.out' Reverted 'ql/src/test/results/clientpositive/nullformatCTAS.q.out' Reverted 'ql/src/test/results/clientpositive/groupby_duplicate_key.q.out' Reverted 'ql/src/test/results/clientpositive/ctas.q.out' Reverted 'ql/src/test/results/clientpositive/union_top_level.q.out' Reverted 'ql/src/test/results/clientpositive/inputddl2.q.out' Reverted 'ql/src/test/results/clientpositive/temp_table.q.out' Reverted 'ql/src/test/results/clientpositive/ctas_colname.q.out' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/IOConstants.java' ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/custom-serde/src/main/resources itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomTextSerDe.java itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomTextStorageFormatDescriptor.java itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target ql/src/test/results/clientpositive/storage_format_descriptor.q.out ql/src/test/org/apache/hadoop/hive/ql/io/TestStorageFormatDescriptor.java ql/src/test/queries/clientpositive/storage_format_descriptor.q ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/StorageFormatFactory.java ql/src/java/org/apache/hadoop/hive/ql/io/ParquetFileStorageFormatDescriptor.java ql/src/java/org/apache/hadoop/hive/ql/io/AbstractStorageFormatDescriptor.java ql/src/java/org/apache/hadoop/hive/ql/io/RCFileStorageFormatDescriptor.java ql/src/java/org/apache/hadoop/hive/ql/io/ORCFileStorageFormatDescriptor.java ql/src/java/org/apache/hadoop/hive/ql/io/TextFileStorageFormatDescriptor.java ql/src/java/org/apache/hadoop/hive/ql/io/SequenceFileStorageFormatDescriptor.java ql/src/java/org/apache/hadoop/hive/ql/io/StorageFormatDescriptor.java ql/src/main/resources/META-INF + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1609903. At revision 1609903. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' This message is automatically generated. ATTACHMENT ID: 12655359
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12655486/HIVE-4765.3.patch.txt

          ERROR: -1 due to 2 failed/errored test(s), 5731 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_functions
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/773/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/773/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-773/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 2 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12655486

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12655486/HIVE-4765.3.patch.txt ERROR: -1 due to 2 failed/errored test(s), 5731 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_functions org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/773/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/773/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-773/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12655486
          Hide
          Nick Dimiduk added a comment -

          This looks like a nice improvement Navis!

          Show
          Nick Dimiduk added a comment - This looks like a nice improvement Navis !
          Hide
          Navis added a comment -

          Nick Dimiduk Thanks. Really wish this get into trunk.

          Show
          Navis added a comment - Nick Dimiduk Thanks. Really wish this get into trunk.
          Hide
          Sushanth Sowmyan added a comment -

          Navis, this patch is an exciting one for me, because I've long wanted to work on introducing OutputCommitter semantics into hive. And given that we've wanted to revamp the hbase bulk load as well for a while, this is a double-win for me.

          That said, I do have a few thoughts on the introduction of the HiveOutputCommitter.

          a) I like that you added a completed() along witht he commit() that allows signalling the end of the commit process. This is a good addition. I think I would have liked some way to add a failed() or equivalent also, I think, to make sure we can signal that something on our end failed, say while moving files or somesuch.

          b) One of my pet peeves with HiveOutputFormat in general is the impedance mismatches in RecordWriter vs. HiveRecordWriter, and the lack of an OutputCommitter has meant that generic OutputFormats would need to be ported over to Hive, or developed completely within hive, rather than being usable as-is. Thus, one of my major goals for introducing an OutputCommitter semantic would be to reduce that mismatch, and move hive towards being able to consume a generic M/R IF / OF with no additional work. To this end, I'm a little wary of introducing a HiveOutputCommitter that will similarly have a mismatch that needs to be "fixed" in the way that the HiveRecordWriter needs to be, in case people implement the interface currently being introduced, and then we worry about having to break them to clean up the interface.

          c) I would prefer HiveOutputFormat to have a method to create/return an output committer(with a default impl returning null), rather than extend HiveOutputCommitter. This matches the M/R form closer and will make it easier to bridge that gap, I think.

          Also, if there was any particular reason you intentionally avoided the M/R Committer idiom, I'd be happy to hear that as well, and we can think on how to create a generic M/R storage handler to wrap generic M/R IF/OFs if need be.

          Show
          Sushanth Sowmyan added a comment - Navis , this patch is an exciting one for me, because I've long wanted to work on introducing OutputCommitter semantics into hive. And given that we've wanted to revamp the hbase bulk load as well for a while, this is a double-win for me. That said, I do have a few thoughts on the introduction of the HiveOutputCommitter. a) I like that you added a completed() along witht he commit() that allows signalling the end of the commit process. This is a good addition. I think I would have liked some way to add a failed() or equivalent also, I think, to make sure we can signal that something on our end failed, say while moving files or somesuch. b) One of my pet peeves with HiveOutputFormat in general is the impedance mismatches in RecordWriter vs. HiveRecordWriter, and the lack of an OutputCommitter has meant that generic OutputFormats would need to be ported over to Hive, or developed completely within hive, rather than being usable as-is. Thus, one of my major goals for introducing an OutputCommitter semantic would be to reduce that mismatch, and move hive towards being able to consume a generic M/R IF / OF with no additional work. To this end, I'm a little wary of introducing a HiveOutputCommitter that will similarly have a mismatch that needs to be "fixed" in the way that the HiveRecordWriter needs to be, in case people implement the interface currently being introduced, and then we worry about having to break them to clean up the interface. c) I would prefer HiveOutputFormat to have a method to create/return an output committer(with a default impl returning null), rather than extend HiveOutputCommitter. This matches the M/R form closer and will make it easier to bridge that gap, I think. Also, if there was any particular reason you intentionally avoided the M/R Committer idiom, I'd be happy to hear that as well, and we can think on how to create a generic M/R storage handler to wrap generic M/R IF/OFs if need be.
          Hide
          Nick Dimiduk added a comment -

          Bump. Patch still applies to master, with a little fuzz.

          Is the new SerDe and Union business necessary? It would be really great to integrate this into the StorageHandler as an online switch, like as I was aiming for on HIVE-2365. Swapping out the output format at runtime seems to work alright and it saves the user from having to define another table, repeat the column mapping stuff, etc.

          Show
          Nick Dimiduk added a comment - Bump. Patch still applies to master, with a little fuzz. Is the new SerDe and Union business necessary? It would be really great to integrate this into the StorageHandler as an online switch, like as I was aiming for on HIVE-2365 . Swapping out the output format at runtime seems to work alright and it saves the user from having to define another table, repeat the column mapping stuff, etc.
          Hide
          Nick Dimiduk added a comment -

          Hi Navis. Have you had time to look at this lately? It would sure be better than the mostly-broken instructions on https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad

          Show
          Nick Dimiduk added a comment - Hi Navis . Have you had time to look at this lately? It would sure be better than the mostly-broken instructions on https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad
          Hide
          Nick Dimiduk added a comment -

          Ping Navis, Sushanth Sowmyan.

          Any chance we can get some action on this one for 0.14 release? It's definitely better than what's available.

          Show
          Nick Dimiduk added a comment - Ping Navis , Sushanth Sowmyan . Any chance we can get some action on this one for 0.14 release? It's definitely better than what's available.
          Hide
          Sushanth Sowmyan added a comment -

          Since this is a significant usability bump, I'm willing to retract my reservations about the introduction of HiveOutputCommitter and go ahead looking at this patch if there's an in-principle understanding that this HiveOutputCommitter semantic is something we shall revisit, and maybe limit the @InterfaceAudience on the commit() and completed() methods in HiveOutputFormat to Private, and maybe the @InterfaceStability to Evolving. I do not want to wind up in a situation where we are unable to roll that back because external (outside the hive codebase) storage handlers have adopted implementing those, and it becomes part of hive, without further discussion/agreement.

          Show
          Sushanth Sowmyan added a comment - Since this is a significant usability bump, I'm willing to retract my reservations about the introduction of HiveOutputCommitter and go ahead looking at this patch if there's an in-principle understanding that this HiveOutputCommitter semantic is something we shall revisit, and maybe limit the @InterfaceAudience on the commit() and completed() methods in HiveOutputFormat to Private, and maybe the @InterfaceStability to Evolving. I do not want to wind up in a situation where we are unable to roll that back because external (outside the hive codebase) storage handlers have adopted implementing those, and it becomes part of hive, without further discussion/agreement.
          Hide
          Nick Dimiduk added a comment -

          Planting a stake to get this in for 1.2.0.

          Show
          Nick Dimiduk added a comment - Planting a stake to get this in for 1.2.0.
          Hide
          Swarnim Kulkarni added a comment -

          +1. Looks good. Though adding more javadoc on some of the newly added public classes would be great.

          Show
          Swarnim Kulkarni added a comment - +1. Looks good. Though adding more javadoc on some of the newly added public classes would be great.
          Hide
          Sushanth Sowmyan added a comment -

          Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0.

          Show
          Sushanth Sowmyan added a comment - Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0.

            People

            • Assignee:
              Navis
              Reporter:
              Navis
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:

                Development