Hive
  1. Hive
  2. HIVE-2417

Merging of compressed rcfiles fails to write the valuebuffer part correctly

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: Query Processor
    • Labels:
      None

      Description

      The blockmerge task does not create proper rc files when merging compressed rc files as the valuebuffer writing is incorrect.

      1. HIVE-2417.v0.patch
        14 kB
        Krishna Kumar
      2. HIVE-2417.v1.patch
        14 kB
        Krishna Kumar

        Activity

        Krishna Kumar created issue -
        Hide
        Krishna Kumar added a comment -

        Test added

        Show
        Krishna Kumar added a comment - Test added
        Krishna Kumar made changes -
        Field Original Value New Value
        Attachment HIVE-2417.v0.patch [ 12492052 ]
        Krishna Kumar made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        He Yongqiang added a comment -

        Good catch, this is a regression introduced in HIVE-2396.
        Can you make the testcase more easy to reproduce the problem? I mean if without the change in this diff, should get an error or incorrect results when running with that testcase.

        1. remove this "+set mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec;",
        2. tgt_rc_merge_test only contains one file, so the 'alter table tgt_rc_merge_test concatenate;' will basically do nothing. Can you make sure this table at least contains 2 files? You can upload 2 gzip compressed rcfile if there is not.

        Show
        He Yongqiang added a comment - Good catch, this is a regression introduced in HIVE-2396 . Can you make the testcase more easy to reproduce the problem? I mean if without the change in this diff, should get an error or incorrect results when running with that testcase. 1. remove this "+set mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec;", 2. tgt_rc_merge_test only contains one file, so the 'alter table tgt_rc_merge_test concatenate;' will basically do nothing. Can you make sure this table at least contains 2 files? You can upload 2 gzip compressed rcfile if there is not.
        Hide
        Krishna Kumar added a comment -

        Yes, the test is designed to produce the error when run without the change. Are you finding that that's not the case? I get an EOFException while running the same steps in my development environment (i.e., not as a unit test).

        1. This is needed so that the rcfiles in the target table are compressed with Bzip2. Do you mean that we should be using Default compression codec instead? Fine with me but why is that important?

        2. tgt does contain more than one file.

        [before alter]
        +POSTHOOK: query: show table extended like `tgt_rc_merge_test`
        ...
        +totalNumberFiles:2
        ...
        [after alter]
        +POSTHOOK: query: show table extended like `tgt_rc_merge_test`
        ...
        +totalNumberFiles:1

        The 'create' adds one file, and the insert adds another file. [OT: Does it make sense append a block merge task after an non-overwrite insert? Dunno...]

        Show
        Krishna Kumar added a comment - Yes, the test is designed to produce the error when run without the change. Are you finding that that's not the case? I get an EOFException while running the same steps in my development environment (i.e., not as a unit test). 1. This is needed so that the rcfiles in the target table are compressed with Bzip2. Do you mean that we should be using Default compression codec instead? Fine with me but why is that important? 2. tgt does contain more than one file. [before alter] +POSTHOOK: query: show table extended like `tgt_rc_merge_test` ... +totalNumberFiles:2 ... [after alter] +POSTHOOK: query: show table extended like `tgt_rc_merge_test` ... +totalNumberFiles:1 The 'create' adds one file, and the insert adds another file. [OT: Does it make sense append a block merge task after an non-overwrite insert? Dunno...]
        Hide
        He Yongqiang added a comment -

        The 'create' adds one file, and the insert adds another file.

        sorry, i thought you are doing an "insert overwrite ", can u do 2 inserts?

        This is needed so that the rcfiles in the target table are compressed with Bzip2. Do you mean that we should be using Default compression codec instead? Fine with me but why is that important?

        Yes. i mean if you remove this line and keep the line "set hive.exec.compress.output = true;". The output will be compressed using DefaultCodec. The reason is that BZip2 may not installed for all hive users/dev.

        Show
        He Yongqiang added a comment - The 'create' adds one file, and the insert adds another file. sorry, i thought you are doing an "insert overwrite ", can u do 2 inserts? This is needed so that the rcfiles in the target table are compressed with Bzip2. Do you mean that we should be using Default compression codec instead? Fine with me but why is that important? Yes. i mean if you remove this line and keep the line "set hive.exec.compress.output = true;". The output will be compressed using DefaultCodec. The reason is that BZip2 may not installed for all hive users/dev.
        Hide
        He Yongqiang added a comment -

        by "2 inserts", i mean remove the "load" command, and use 2 inserts to pop the data.

        Show
        He Yongqiang added a comment - by "2 inserts", i mean remove the "load" command, and use 2 inserts to pop the data.
        Hide
        Krishna Kumar added a comment -

        Test changed after review comments

        • default codec instead of bzip2
        • Create + 2 inserts instead of CTAS + 1 insert
        Show
        Krishna Kumar added a comment - Test changed after review comments default codec instead of bzip2 Create + 2 inserts instead of CTAS + 1 insert
        Krishna Kumar made changes -
        Attachment HIVE-2417.v1.patch [ 12492414 ]
        Hide
        He Yongqiang added a comment -

        +1, will commit after tests pass

        Show
        He Yongqiang added a comment - +1, will commit after tests pass
        Hide
        He Yongqiang added a comment -

        Committed, thanks Krishna Kumar!

        Show
        He Yongqiang added a comment - Committed, thanks Krishna Kumar!
        He Yongqiang made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #928 (See https://builds.apache.org/job/Hive-trunk-h0.21/928/)
        HIVE-2417: Merging of compressed rcfiles fails to write the valuebuffer part correctly (Krishna Kumar via He Yongqiang)

        heyongqiang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1164278
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
        • /hive/trunk/ql/src/test/queries/clientpositive/create_merge_compressed.q
        • /hive/trunk/ql/src/test/results/clientpositive/create_merge_compressed.q.out
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #928 (See https://builds.apache.org/job/Hive-trunk-h0.21/928/ ) HIVE-2417 : Merging of compressed rcfiles fails to write the valuebuffer part correctly (Krishna Kumar via He Yongqiang) heyongqiang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1164278 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java /hive/trunk/ql/src/test/queries/clientpositive/create_merge_compressed.q /hive/trunk/ql/src/test/results/clientpositive/create_merge_compressed.q.out
        Carl Steinbach made changes -
        Fix Version/s 0.9.0 [ 12317742 ]
        Carl Steinbach made changes -
        Fix Version/s 0.8.0 [ 12316178 ]
        Carl Steinbach made changes -
        Fix Version/s 0.9.0 [ 12317742 ]
        Carl Steinbach made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        2m 11s 1 Krishna Kumar 29/Aug/11 11:23
        Patch Available Patch Available Resolved Resolved
        3d 10h 54m 1 He Yongqiang 01/Sep/11 22:18
        Resolved Resolved Closed Closed
        106d 2h 37m 1 Carl Steinbach 16/Dec/11 23:55

          People

          • Assignee:
            Krishna Kumar
            Reporter:
            Krishna Kumar
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development