Hive
  1. Hive
  2. HIVE-2035

Use block-level merge for RCFile if merging intermediate results are needed

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None
    • Release Note:
      For tables stored as RCFile, intermediate results that have too many small files will be merged with a block-level merge that does not deserialize and re-serialized the contents of each block.

      Description

      Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true the intermediate data could be merged using an additional MapReduce job. This could be quite expensive if the data size is large. With HIVE-1950, merging can be done in the RCFile block level so that it bypasses the (de-)compression, (de-)serialization phases. This could improve the merge process significantly.

      This JIRA should handle the case where the input table is not stored in RCFile, but the destination table is (which requires the intermediate data should be stored in the same format as the destination table).

      1. hive-2035.1.patch
        470 kB
        Franklin Hu
      2. hive-2035.3.patch
        458 kB
        Franklin Hu

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          111d 1h 15m 1 Siying Dong 28/Jun/11 00:43
          Patch Available Patch Available Resolved Resolved
          20d 19h 58m 1 Franklin Hu 18/Jul/11 20:41
          Resolved Resolved Closed Closed
          151d 4h 15m 1 Carl Steinbach 16/Dec/11 23:56
          Prasanth Jayachandran made changes -
          Link This issue relates to HIVE-7509 [ HIVE-7509 ]
          Gavin made changes -
          Link This issue relates to HIVE-1950 [ HIVE-1950 ]
          Gavin made changes -
          Link This issue relates to HIVE-1950 [ HIVE-1950 ]
          Carl Steinbach made changes -
          Link This issue relates HIVE-1950 [ HIVE-1950 ]
          Carl Steinbach made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Franklin Hu made changes -
          Release Note For tables stored as RCFile, intermediate results that have too many small files will be merged with a block-level merge that does not deserialize and re-serialized the contents of each block.
          Franklin Hu made changes -
          Fix Version/s 0.8.0 [ 12316178 ]
          Resolution Fixed [ 1 ]
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #798 (See https://builds.apache.org/job/Hive-trunk-h0.21/798/)

          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #798 (See https://builds.apache.org/job/Hive-trunk-h0.21/798/ )
          Hide
          Siying Dong added a comment -

          committed

          Show
          Siying Dong added a comment - committed
          Siying Dong made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Siying Dong added a comment -

          +1, will run regression tests

          Show
          Siying Dong added a comment - +1, will run regression tests
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/935/
          -----------------------------------------------------------

          (Updated 2011-06-23 18:56:14.903379)

          Review request for hive.

          Changes
          -------

          Add max and min split size configs to unit tests

          Summary
          -------

          For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE.

          This addresses bug HIVE-2035.
          https://issues.apache.org/jira/browse/HIVE-2035

          Diffs (updated)


          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1139014
          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1139014
          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1139014
          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1139014
          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1139014
          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1139014
          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1139014
          trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1139014
          trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1139014
          trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1139014
          trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION
          trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION

          Diff: https://reviews.apache.org/r/935/diff

          Testing
          -------

          Thanks,

          Franklin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/935/ ----------------------------------------------------------- (Updated 2011-06-23 18:56:14.903379) Review request for hive. Changes ------- Add max and min split size configs to unit tests Summary ------- For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE. This addresses bug HIVE-2035 . https://issues.apache.org/jira/browse/HIVE-2035 Diffs (updated) trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1139014 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1139014 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1139014 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1139014 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1139014 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1139014 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1139014 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1139014 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1139014 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1139014 trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION Diff: https://reviews.apache.org/r/935/diff Testing ------- Thanks, Franklin
          Franklin Hu made changes -
          Attachment hive-2035.3.patch [ 12483632 ]
          Hide
          Franklin Hu added a comment -

          Add min/max split size settings to unit tests

          Show
          Franklin Hu added a comment - Add min/max split size settings to unit tests
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/935/#review875
          -----------------------------------------------------------

          Can you make sure that in the test cases, the query need the merge step?

          • Siying

          On 2011-06-20 19:20:53, Franklin Hu wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/935/

          -----------------------------------------------------------

          (Updated 2011-06-20 19:20:53)

          Review request for hive.

          Summary

          -------

          For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE.

          This addresses bug HIVE-2035.

          https://issues.apache.org/jira/browse/HIVE-2035

          Diffs

          -----

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1136090

          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1136090

          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1136090

          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1136090

          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1136090

          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1136090

          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1136090

          trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1136090

          trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1136090

          trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1136090

          trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION

          trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION

          trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION

          trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION

          trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION

          trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION

          trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION

          trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION

          trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION

          trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION

          Diff: https://reviews.apache.org/r/935/diff

          Testing

          -------

          Thanks,

          Franklin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/935/#review875 ----------------------------------------------------------- Can you make sure that in the test cases, the query need the merge step? Siying On 2011-06-20 19:20:53, Franklin Hu wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/935/ ----------------------------------------------------------- (Updated 2011-06-20 19:20:53) Review request for hive. Summary ------- For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE. This addresses bug HIVE-2035 . https://issues.apache.org/jira/browse/HIVE-2035 Diffs ----- trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1136090 trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION Diff: https://reviews.apache.org/r/935/diff Testing ------- Thanks, Franklin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/935/
          -----------------------------------------------------------

          (Updated 2011-06-20 19:20:53.263299)

          Review request for hive.

          Changes
          -------

          Throw error at compile time for bad rcfile merge input format class rather than at runtime, remove bad test, stylistic fixes

          Summary
          -------

          For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE.

          This addresses bug HIVE-2035.
          https://issues.apache.org/jira/browse/HIVE-2035

          Diffs (updated)


          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1136090
          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1136090
          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1136090
          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1136090
          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1136090
          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1136090
          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1136090
          trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1136090
          trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1136090
          trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1136090
          trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION
          trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION

          Diff: https://reviews.apache.org/r/935/diff

          Testing
          -------

          Thanks,

          Franklin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/935/ ----------------------------------------------------------- (Updated 2011-06-20 19:20:53.263299) Review request for hive. Changes ------- Throw error at compile time for bad rcfile merge input format class rather than at runtime, remove bad test, stylistic fixes Summary ------- For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE. This addresses bug HIVE-2035 . https://issues.apache.org/jira/browse/HIVE-2035 Diffs (updated) trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1136090 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1136090 trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION Diff: https://reviews.apache.org/r/935/diff Testing ------- Thanks, Franklin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/935/#review864
          -----------------------------------------------------------

          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java
          <https://reviews.apache.org/r/935/#comment1889>

          It doesn't seem to be a RuntimeException

          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java
          <https://reviews.apache.org/r/935/#comment1890>

          why not "inputDepth--"?

          trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
          <https://reviews.apache.org/r/935/#comment1891>

          should we just throw an exception instead of return a magic null?

          trunk/ql/src/test/queries/clientpositive/rcfile_insert.q
          <https://reviews.apache.org/r/935/#comment1893>

          Will it launch a merge job? If it launches, it seems a bug in Hive that CombineHiveInputFormat doesn't span to multiple partitions when it needs to.

          trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q
          <https://reviews.apache.org/r/935/#comment1892>

          It doesn't seem to launch merge jobs. If it launches. It seems to be a bug.

          • Siying

          On 2011-06-17 20:45:46, Franklin Hu wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/935/

          -----------------------------------------------------------

          (Updated 2011-06-17 20:45:46)

          Review request for hive.

          Summary

          -------

          For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE.

          This addresses bug HIVE-2035.

          https://issues.apache.org/jira/browse/HIVE-2035

          Diffs

          -----

          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1134415

          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1134415

          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1134415

          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1134415

          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1134415

          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1134415

          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1134415

          trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1134415

          trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1134415

          trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1134415

          trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION

          trunk/ql/src/test/queries/clientpositive/rcfile_insert.q PRE-CREATION

          trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION

          trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION

          trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION

          trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION

          trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION

          trunk/ql/src/test/results/clientpositive/rcfile_insert.q.out PRE-CREATION

          trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION

          trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION

          trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION

          trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION

          Diff: https://reviews.apache.org/r/935/diff

          Testing

          -------

          Thanks,

          Franklin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/935/#review864 ----------------------------------------------------------- trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java < https://reviews.apache.org/r/935/#comment1889 > It doesn't seem to be a RuntimeException trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java < https://reviews.apache.org/r/935/#comment1890 > why not "inputDepth--"? trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java < https://reviews.apache.org/r/935/#comment1891 > should we just throw an exception instead of return a magic null? trunk/ql/src/test/queries/clientpositive/rcfile_insert.q < https://reviews.apache.org/r/935/#comment1893 > Will it launch a merge job? If it launches, it seems a bug in Hive that CombineHiveInputFormat doesn't span to multiple partitions when it needs to. trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q < https://reviews.apache.org/r/935/#comment1892 > It doesn't seem to launch merge jobs. If it launches. It seems to be a bug. Siying On 2011-06-17 20:45:46, Franklin Hu wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/935/ ----------------------------------------------------------- (Updated 2011-06-17 20:45:46) Review request for hive. Summary ------- For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE. This addresses bug HIVE-2035 . https://issues.apache.org/jira/browse/HIVE-2035 Diffs ----- trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1134415 trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_insert.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_insert.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION Diff: https://reviews.apache.org/r/935/diff Testing ------- Thanks, Franklin
          Hide
          Siying Dong added a comment -

          will take a look.

          Show
          Siying Dong added a comment - will take a look.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/935/
          -----------------------------------------------------------

          Review request for hive.

          Summary
          -------

          For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE.

          This addresses bug HIVE-2035.
          https://issues.apache.org/jira/browse/HIVE-2035

          Diffs


          trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1134415
          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1134415
          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1134415
          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1134415
          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1134415
          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1134415
          trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1134415
          trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1134415
          trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1134415
          trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1134415
          trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/rcfile_insert.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION
          trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/rcfile_insert.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION

          Diff: https://reviews.apache.org/r/935/diff

          Testing
          -------

          Thanks,

          Franklin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/935/ ----------------------------------------------------------- Review request for hive. Summary ------- For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE. This addresses bug HIVE-2035 . https://issues.apache.org/jira/browse/HIVE-2035 Diffs trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1134415 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1134415 trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_insert.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_insert.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION Diff: https://reviews.apache.org/r/935/diff Testing ------- Thanks, Franklin
          Franklin Hu made changes -
          Attachment hive-2035.1.patch [ 12482096 ]
          Hide
          Franklin Hu added a comment -

          Implements block level merge of intermediate results to a table or partition stored as RCFile.

          Show
          Franklin Hu added a comment - Implements block level merge of intermediate results to a table or partition stored as RCFile.
          Franklin Hu made changes -
          Assignee Franklin Hu [ franklinhu ]
          He Yongqiang made changes -
          Field Original Value New Value
          Assignee He Yongqiang [ he yongqiang ]
          Hide
          Ashutosh Chauhan added a comment -

          Yeah, correct Ning. I missed jira number by 100. Edited the review request.
          Btw, it will be great if you can take a look at that.

          Show
          Ashutosh Chauhan added a comment - Yeah, correct Ning. I missed jira number by 100. Edited the review request. Btw, it will be great if you can take a look at that.
          Hide
          Ning Zhang added a comment -

          Ashutosh, it seems the review request #669 got a wrong HIVE JIRA reference?

          Show
          Ning Zhang added a comment - Ashutosh, it seems the review request #669 got a wrong HIVE JIRA reference?
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/669/
          -----------------------------------------------------------

          Review request for hive, Carl Steinbach, John Sichi, and Paul Yang.

          Summary
          -------

          See HIVE-2135

          This addresses bug HIVE-2035.
          https://issues.apache.org/jira/browse/HIVE-2035

          Diffs


          trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1096976
          trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreCommand.java PRE-CREATION
          trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1096976
          trunk/metastore/src/java/org/apache/hadoop/hive/metastore/URLConnectionUpdater.java PRE-CREATION

          Diff: https://reviews.apache.org/r/669/diff

          Testing
          -------

          Since this is a refactoring patch, no new tests are required. Ran all the tests in metastore. All of them passed.

          Thanks,

          Ashutosh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/669/ ----------------------------------------------------------- Review request for hive, Carl Steinbach, John Sichi, and Paul Yang. Summary ------- See HIVE-2135 This addresses bug HIVE-2035 . https://issues.apache.org/jira/browse/HIVE-2035 Diffs trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1096976 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreCommand.java PRE-CREATION trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1096976 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/URLConnectionUpdater.java PRE-CREATION Diff: https://reviews.apache.org/r/669/diff Testing ------- Since this is a refactoring patch, no new tests are required. Ran all the tests in metastore. All of them passed. Thanks, Ashutosh
          Ning Zhang created issue -

            People

            • Assignee:
              Franklin Hu
              Reporter:
              Ning Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development