Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Tags:
      rcfile

      Description

      Some potential issues with RCFile

      1. Remove unwanted synchronized modifiers on the methods of RCFile. As per yongqiang he, the class is not meant to be thread-safe (and it is not). Might as well get rid of the confusing and performance-impacting lock acquisitions.

      2. Record Length overstated for compressed files. IIUC, the key compression happens after we have written the record length.

            int keyLength = key.getSize();
            if (keyLength < 0) {
              throw new IOException("negative length keys not allowed: " + key);
            }
      
            out.writeInt(keyLength + valueLength); // total record length
            out.writeInt(keyLength); // key portion length
            if (!isCompressed()) {
              out.writeInt(keyLength);
              key.write(out); // key
            } else {
              keyCompressionBuffer.reset();
              keyDeflateFilter.resetState();
              key.write(keyDeflateOut);
              keyDeflateOut.flush();
              keyDeflateFilter.finish();
              int compressedKeyLen = keyCompressionBuffer.getLength();
              out.writeInt(compressedKeyLen);
              out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
            }
      

      3. For sequence file compatibility, the compressed key length should be the next field to record length, not the uncompressed key length.

      1. HIVE.2065.patch.1.txt
        90 kB
        Krishna Kumar
      2. HIVE.2065.patch.0.txt
        113 kB
        Krishna Kumar
      3. proposal.png
        152 kB
        Krishna Kumar
      4. Slide1.png
        121 kB
        Krishna Kumar

        Activity

        Hide
        Krishna Kumar added a comment -

        Compressed RCFile Layout

        Show
        Krishna Kumar added a comment - Compressed RCFile Layout
        Hide
        He Yongqiang added a comment -

        Good catch about point 2. It is for uncompressed length, and then is used as compressed length in the code when seeking to next block. Fortunately we have not triggered this bug in Hive so far .

        Show
        He Yongqiang added a comment - Good catch about point 2. It is for uncompressed length, and then is used as compressed length in the code when seeking to next block. Fortunately we have not triggered this bug in Hive so far .
        Hide
        Krishna Kumar added a comment -

        So should I go ahead and fix #2 and #3 as well? Note that these are non-compatible changes, so the version number will need to be bumped up.

        My proposal:

        Fix the issues in the new format

        • up the version number to 7.
        • compute and store record length as (compressed key length = 4 + compressed key contents length) + compressed value length
        • store compressed key length as the next 4-byte field
        • key contains 4-byte uncompressed key contents length + compressed key contents

        Provide backward compatibility

        • while reading version 6,
        • interpret fields as now but recalculate the recordlength from the next two fields (as record length = record length - uncompressed key length + compressed key length)
        Show
        Krishna Kumar added a comment - So should I go ahead and fix #2 and #3 as well? Note that these are non-compatible changes, so the version number will need to be bumped up. My proposal: Fix the issues in the new format up the version number to 7. compute and store record length as (compressed key length = 4 + compressed key contents length) + compressed value length store compressed key length as the next 4-byte field key contains 4-byte uncompressed key contents length + compressed key contents Provide backward compatibility while reading version 6, interpret fields as now but recalculate the recordlength from the next two fields (as record length = record length - uncompressed key length + compressed key length)
        Hide
        Krishna Kumar added a comment -

        Notation : Length bracket inside the dashed box means it is the uncompressed length. Length bracket outside the dashed box means it is the compressed length.

        Show
        Krishna Kumar added a comment - Notation : Length bracket inside the dashed box means it is the uncompressed length. Length bracket outside the dashed box means it is the compressed length.
        Hide
        He Yongqiang added a comment -

        Yes. This should be fixed. It should be ok to increase the version number. Also need to fix seekToNextKeyBuffer() to minus 4 if you want to add 4 to recordlength.

        For issue 3, do you think it can still be compatible with today's sequencefile. (I haven't looked at the recent of impl of seqfile.) If not, maybe we can just leave it as today is, so we do not have any backward compatibility issue.

        Show
        He Yongqiang added a comment - Yes. This should be fixed. It should be ok to increase the version number. Also need to fix seekToNextKeyBuffer() to minus 4 if you want to add 4 to recordlength. For issue 3, do you think it can still be compatible with today's sequencefile. (I haven't looked at the recent of impl of seqfile.) If not, maybe we can just leave it as today is, so we do not have any backward compatibility issue.
        Hide
        Krishna Kumar added a comment -

        Hmm. #3 is taking me a bit too far than I originally thought. I assume being able to read an RCFile as SequenceFile is required, while being able to write an RCFile via the SequenceFile interface is desirable.

        Having made changes so that record length is correctly set, in order to be able to make sure that the rcfile is handled correctly as a sequence file, the following changes are also required, IIUC.

        • the second field should be the key length (4 + compressed/plain key contents)
        • the key class (KeyBuffer) must be made responsible for reading/writing the next field - plain key contents length - as well as compression/decompression of the key contents
        • the value class (ValueBuffer) related changes will be trickier. Since the value is not compressed as a unit, we can not use record-compressed format. We need to mark the records as plain records, and move the codec to a metadata entry. Then the valueBuffer class will work correctly with sequencefile implementation.

        Thoughts? worth it?

        Show
        Krishna Kumar added a comment - Hmm. #3 is taking me a bit too far than I originally thought. I assume being able to read an RCFile as SequenceFile is required, while being able to write an RCFile via the SequenceFile interface is desirable. Having made changes so that record length is correctly set, in order to be able to make sure that the rcfile is handled correctly as a sequence file, the following changes are also required, IIUC. the second field should be the key length (4 + compressed/plain key contents) the key class (KeyBuffer) must be made responsible for reading/writing the next field - plain key contents length - as well as compression/decompression of the key contents the value class (ValueBuffer) related changes will be trickier. Since the value is not compressed as a unit, we can not use record-compressed format. We need to mark the records as plain records, and move the codec to a metadata entry. Then the valueBuffer class will work correctly with sequencefile implementation. Thoughts? worth it?
        Hide
        He Yongqiang added a comment -

        if being compatible with sequencefile does not break the rcfile's backward compatibility, it should be ok. But even after that, hive still won't be able to process it as a sequence file because of hive's serde layer.

        Show
        He Yongqiang added a comment - if being compatible with sequencefile does not break the rcfile's backward compatibility, it should be ok. But even after that, hive still won't be able to process it as a sequence file because of hive's serde layer.
        Hide
        Krishna Kumar added a comment -

        The layout as implemented in the attached patch

        Show
        Krishna Kumar added a comment - The layout as implemented in the attached patch
        Hide
        Krishna Kumar added a comment -

        Patch. Since it contains binary files, it was generated with "--binary", and needs to be applied as "git apply HIVE.2065.patch.0.txt"

        Show
        Krishna Kumar added a comment - Patch. Since it contains binary files, it was generated with "--binary", and needs to be applied as "git apply HIVE.2065.patch.0.txt"
        Hide
        Krishna Kumar added a comment -

        Review Board Entry at https://reviews.apache.org/r/529/

        Show
        Krishna Kumar added a comment - Review Board Entry at https://reviews.apache.org/r/529/
        Hide
        He Yongqiang added a comment -

        will take a look. but Krishna, I still do not get the benefit of letting RCFile being read by SequenceFile. can you explain more about the benefit?

        Show
        He Yongqiang added a comment - will take a look. but Krishna, I still do not get the benefit of letting RCFile being read by SequenceFile. can you explain more about the benefit?
        Hide
        Krishna Kumar added a comment -

        The RCFile layout seems to have been designed initially to be compatible to SequenceFile but over a period of time (esp. due to key compression enhancement?), it seems to have drifted away. The compatibility intent goes as far as to have boolean values always false etc (blockCompression), but couple of bugs have been introduced later whereby the recordlength is no longer the ondisk record length, and the keylength field is no longer the ondisk key length. Once I started writing a unit test for ensuring that the rcfile layout does stay in sync with sequence file layout, I also found that the classes designated as the keyclass/valueclass are no longer able to read themselves in or write themselves out, even if properly 'primed'. That is the primary aim of the changes due to #3.

        [PS. The reason I am looking into this now is experiment with column-specific compression ('use this codec for this sorted, numeric column') or type-specific compression ('use this codec for all enumerations types of this table'). Presumably, if successful, this information will be put into metadata as I am doing with the generic codec in the changes above.]

        Show
        Krishna Kumar added a comment - The RCFile layout seems to have been designed initially to be compatible to SequenceFile but over a period of time (esp. due to key compression enhancement?), it seems to have drifted away. The compatibility intent goes as far as to have boolean values always false etc (blockCompression), but couple of bugs have been introduced later whereby the recordlength is no longer the ondisk record length, and the keylength field is no longer the ondisk key length. Once I started writing a unit test for ensuring that the rcfile layout does stay in sync with sequence file layout, I also found that the classes designated as the keyclass/valueclass are no longer able to read themselves in or write themselves out, even if properly 'primed'. That is the primary aim of the changes due to #3. [PS. The reason I am looking into this now is experiment with column-specific compression ('use this codec for this sorted, numeric column') or type-specific compression ('use this codec for all enumerations types of this table'). Presumably, if successful, this information will be put into metadata as I am doing with the generic codec in the changes above.]
        Hide
        He Yongqiang added a comment -

        The column-specific compression is very interesting, but it is not directly related to make RCFile compatible with Seqfile. We can still do that without this compatibility.

        Some inputs maybe useful to you:
        we examined column groups, and sort the data internally based on one column in one column group. (But we did not try different compressions across column groups.) Tried this with 3-4 tables, and we see ~20% storage savings on one table compared the previous RCFile. The main problems for this approach is that it is hard to find out the correct/most efficient column group definitions.
        One example, table tbl_1 has 20 columns, and user can define:

        col_1,col_2,col_11,col_13:0;col_3,col_4,col_15,col_16:1;

        This will put col_1, col_2,col_11, col_13 into one column group, and reorder that column group based on sorting col_1 (0 is the first column in this column group), and put col_3, col_4, col_15,col_16 into another column group, and reorder this column group based on sorting col_4, and finally put all other columns into the default column group with original order.
        And should be easy to allow different compression codec for different column groups.

        The main block issue for this approach is have a full set of utils to find out the best column group definition.

        Instead of doing that in the existing RCFile, do you think it would be better if we can explore it in the new one that i just mentioned. If you think interesting, we can share you the existing code that we have for things i mentioned. And you can work on the compression codec based on the new one, and provide a util tool to find out the best column group definition.

        what do you think?

        Show
        He Yongqiang added a comment - The column-specific compression is very interesting, but it is not directly related to make RCFile compatible with Seqfile. We can still do that without this compatibility. Some inputs maybe useful to you: we examined column groups, and sort the data internally based on one column in one column group. (But we did not try different compressions across column groups.) Tried this with 3-4 tables, and we see ~20% storage savings on one table compared the previous RCFile. The main problems for this approach is that it is hard to find out the correct/most efficient column group definitions. One example, table tbl_1 has 20 columns, and user can define: col_1,col_2,col_11,col_13:0;col_3,col_4,col_15,col_16:1; This will put col_1, col_2,col_11, col_13 into one column group, and reorder that column group based on sorting col_1 (0 is the first column in this column group), and put col_3, col_4, col_15,col_16 into another column group, and reorder this column group based on sorting col_4, and finally put all other columns into the default column group with original order. And should be easy to allow different compression codec for different column groups. The main block issue for this approach is have a full set of utils to find out the best column group definition. Instead of doing that in the existing RCFile, do you think it would be better if we can explore it in the new one that i just mentioned. If you think interesting, we can share you the existing code that we have for things i mentioned. And you can work on the compression codec based on the new one, and provide a util tool to find out the best column group definition. what do you think?
        Hide
        Krishna Kumar added a comment -

        As I indicated, the reason I made changes for sequence file compatibility is that the original design was more or less compatible, but the compatibility is now currently broken. If we decide that the compatibility is not a requirement, I am fine with that - a few documentation changes will be all that is necessary to indicate that situation. The current layout itself, with the SEQ prefix, incorrect key length and keyclass/valueclass header etc, will not make much sense, but we can designate all that 'legacy'

        I'd like to move the discussions re potential approaches for better compression to another thread - any existing bug#? or should I open a new one?.

        Show
        Krishna Kumar added a comment - As I indicated, the reason I made changes for sequence file compatibility is that the original design was more or less compatible, but the compatibility is now currently broken. If we decide that the compatibility is not a requirement, I am fine with that - a few documentation changes will be all that is necessary to indicate that situation. The current layout itself, with the SEQ prefix, incorrect key length and keyclass/valueclass header etc, will not make much sense, but we can designate all that 'legacy' I'd like to move the discussions re potential approaches for better compression to another thread - any existing bug#? or should I open a new one?.
        Hide
        He Yongqiang added a comment -

        Let's leave the compatibility issue, and fix the incorrect length issue in this jira.

        And feel free to open a new jira for the discussion of better compression.

        Show
        He Yongqiang added a comment - Let's leave the compatibility issue, and fix the incorrect length issue in this jira. And feel free to open a new jira for the discussion of better compression.
        Hide
        Krishna Kumar added a comment -

        Updated patch where sequence file compliance is not addressed but the other two issues are.

        Show
        Krishna Kumar added a comment - Updated patch where sequence file compliance is not addressed but the other two issues are.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/529/
        -----------------------------------------------------------

        (Updated 2011-04-06 17:13:30.910168)

        Review request for hive and Yongqiang He.

        Changes
        -------

        Updated patch where sequence file compliance is not addressed but the other two issues are.

        Summary
        -------

        Patch for HIVE-2065

        This addresses bug HIVE-2065.
        https://issues.apache.org/jira/browse/HIVE-2065

        Diffs (updated)


        build-common.xml 9f21a69
        data/files/test_v6dot0_compressed.rc PRE-CREATION
        data/files/test_v6dot0_uncompressed.rc PRE-CREATION
        ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java eb5305b
        ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 20d1f4e
        ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java f7eacdc
        ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java bb1e3c9
        ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 8bb6f3a
        ql/src/test/results/clientpositive/alter_merge.q.out 25f36c0
        ql/src/test/results/clientpositive/alter_merge_stats.q.out 243f7cc
        ql/src/test/results/clientpositive/partition_wise_fileformat.q.out cee2e72
        ql/src/test/results/clientpositive/partition_wise_fileformat3.q.out 067ab43
        ql/src/test/results/clientpositive/sample10.q.out 50406c3

        Diff: https://reviews.apache.org/r/529/diff

        Testing
        -------

        Tests added, existing tests updated

        Thanks,

        Krishna

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/529/ ----------------------------------------------------------- (Updated 2011-04-06 17:13:30.910168) Review request for hive and Yongqiang He. Changes ------- Updated patch where sequence file compliance is not addressed but the other two issues are. Summary ------- Patch for HIVE-2065 This addresses bug HIVE-2065 . https://issues.apache.org/jira/browse/HIVE-2065 Diffs (updated) build-common.xml 9f21a69 data/files/test_v6dot0_compressed.rc PRE-CREATION data/files/test_v6dot0_uncompressed.rc PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java eb5305b ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 20d1f4e ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java f7eacdc ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java bb1e3c9 ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 8bb6f3a ql/src/test/results/clientpositive/alter_merge.q.out 25f36c0 ql/src/test/results/clientpositive/alter_merge_stats.q.out 243f7cc ql/src/test/results/clientpositive/partition_wise_fileformat.q.out cee2e72 ql/src/test/results/clientpositive/partition_wise_fileformat3.q.out 067ab43 ql/src/test/results/clientpositive/sample10.q.out 50406c3 Diff: https://reviews.apache.org/r/529/diff Testing ------- Tests added, existing tests updated Thanks, Krishna
        Hide
        Krishna Kumar added a comment -

        Updated patch where sequence file compliance is not addressed but the other two issues are. Review board also updated : https://reviews.apache.org/r/529/

        Show
        Krishna Kumar added a comment - Updated patch where sequence file compliance is not addressed but the other two issues are. Review board also updated : https://reviews.apache.org/r/529/
        Hide
        He Yongqiang added a comment -

        why is there a minor.version needed in metadata?
        Mostly looks good, but i prefer not to change the code too much as some other external things are depending on it.

        Show
        He Yongqiang added a comment - why is there a minor.version needed in metadata? Mostly looks good, but i prefer not to change the code too much as some other external things are depending on it.
        Hide
        He Yongqiang added a comment -

        sorry, maybe not clear, i am not discouraging changing the code, i am just saying the risk worths the benefits.

        Show
        He Yongqiang added a comment - sorry, maybe not clear, i am not discouraging changing the code, i am just saying the risk worths the benefits.
        Hide
        Krishna Kumar added a comment -

        The minor version is needed so that we can still read 6.0 files correctly. To recap, 6.0 files have incorrect record length and while reading, we make the necessary recalculations to fix it up, while 6.1 onwards have the correct record length stored on disk.

        [PS. I had suggested bumping up the sequence file version to 7 in a comment above, but I think a minor version is a better idea. The layout itself is still 'kinda sorta' version-6-compatible. For all we know, there may be a sequence file version 7, and then sequence file version 7 and rc file version 7 would be divergent.]

        [PPS. For the sake of completeness of documentation, here are the reason why the layout, even after the current patch, is still short of complete version-6 compatibility : [a] The KeyBuffer, denoted as the key class, is unable to read or write itself from/to the disk stream as the reading/writing the 4-byte key contents length field and the compression/decompression are being done by the reader/writer and not the KeyBuffer class and [b] The ValueBuffer, the value class, must be compressed as a unit to be compatible to sequence file reader/writer, but it is actually compressed as several units.]

        Show
        Krishna Kumar added a comment - The minor version is needed so that we can still read 6.0 files correctly. To recap, 6.0 files have incorrect record length and while reading, we make the necessary recalculations to fix it up, while 6.1 onwards have the correct record length stored on disk. [PS. I had suggested bumping up the sequence file version to 7 in a comment above, but I think a minor version is a better idea. The layout itself is still 'kinda sorta' version-6-compatible. For all we know, there may be a sequence file version 7, and then sequence file version 7 and rc file version 7 would be divergent.] [PPS. For the sake of completeness of documentation, here are the reason why the layout, even after the current patch, is still short of complete version-6 compatibility : [a] The KeyBuffer, denoted as the key class, is unable to read or write itself from/to the disk stream as the reading/writing the 4-byte key contents length field and the compression/decompression are being done by the reader/writer and not the KeyBuffer class and [b] The ValueBuffer, the value class, must be compressed as a unit to be compatible to sequence file reader/writer, but it is actually compressed as several units.]
        Hide
        John Sichi added a comment -

        This one has been sitting in Patch Available queue for a while...are there issues that still need to be resolved?

        Show
        John Sichi added a comment - This one has been sitting in Patch Available queue for a while...are there issues that still need to be resolved?
        Hide
        He Yongqiang added a comment -

        okay, i did a 'cancel patch'. The current patch is still not very clean,

        Show
        He Yongqiang added a comment - okay, i did a 'cancel patch'. The current patch is still not very clean,

          People

          • Assignee:
            Krishna Kumar
            Reporter:
            Krishna Kumar
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development