[HIVE-2065] RCFile issues - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Serializers/Deserializers
Labels:
None

Tags:
rcfile

Description

Some potential issues with RCFile

1. Remove unwanted synchronized modifiers on the methods of RCFile. As per yongqiang he, the class is not meant to be thread-safe (and it is not). Might as well get rid of the confusing and performance-impacting lock acquisitions.

2. Record Length overstated for compressed files. IIUC, the key compression happens after we have written the record length.

      int keyLength = key.getSize();
      if (keyLength < 0) {
        throw new IOException("negative length keys not allowed: " + key);
      }

      out.writeInt(keyLength + valueLength); // total record length
      out.writeInt(keyLength); // key portion length
      if (!isCompressed()) {
        out.writeInt(keyLength);
        key.write(out); // key
      } else {
        keyCompressionBuffer.reset();
        keyDeflateFilter.resetState();
        key.write(keyDeflateOut);
        keyDeflateOut.flush();
        keyDeflateFilter.finish();
        int compressedKeyLen = keyCompressionBuffer.getLength();
        out.writeInt(compressedKeyLen);
        out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
      }

3. For sequence file compatibility, the compressed key length should be the next field to record length, not the uncompressed key length.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Slide1.png
19/Mar/11 02:05
121 kB
Krishna Kumar
proposal.png
28/Mar/11 16:19
152 kB
Krishna Kumar
HIVE.2065.patch.0.txt
28/Mar/11 16:21
113 kB
Krishna Kumar
HIVE.2065.patch.1.txt
06/Apr/11 17:14
90 kB
Krishna Kumar

Activity

People

Assignee:: Krishna Kumar

Reporter:: Krishna Kumar

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 19/Mar/11 02:04

Updated:: 01/Dec/11 07:12