Hadoop Common
  1. Hadoop Common
  2. HADOOP-8258

Add interfaces for compression codecs to use direct byte buffers

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: io, native, performance
    • Labels:
      None

      Description

      Currently, the codec interface only provides input/output functions based on byte arrays. Given that most of the codecs are implemented in native code, this necessitates two extra copies - one to copy the input data to a direct buffer, and one to copy the output data back to a byte array. We should add interfaces to Decompressor/Compressor that can work directly with direct byte buffers to avoid these copies.

        Issue Links

          Activity

          Todd Lipcon created issue -
          Hide
          Todd Lipcon added a comment -

          In current versions of Hadoop, the read path for applications like HBase often looks like:

          • allocate a byte array for an HFile block (~64kb)
          • call read() into that byte array:
            • copy 1: read() packets from the socket into a direct buffer provided by the DirectBufferPool
            • copy 2: copy from the direct buffer pool into the provided byte[]
          • call setInput on a decompressor
            • copy 3: copy from the byte[] back to a direct buffer inside the codec implementation
          • call decompress:
            • JNI code accesses the input buffer and writes to the output buffer
            • copy 4: from the output buffer back into the byte[] for the uncompressed hfile block
            • ineffiency: HBase now does its own checksumming. Since it has to checksum the byte[], it can't easily use the SSE-enabled checksum path.

          Given the new direct-buffer read support introduced by HDFS-2834, we can remove copy #2 and #3

          • allocate a DirectBuffer for the compressed hfile block, and one for the uncompressed block (we know the size from the hfile block header)
          • call read() into the direct buffer using the HDFS-2834 API
            • copy 1: read() packets from the socket into that buffer
          • call setInput() with that buffer. no copies necessary
          • call decompress:
            • JNI code accesses the input buffer and writes directly to the output buffer, with no copies
          • HBase now has the uncompressed block as a direct buffer. It can use the SSE-enabled checksum for better efficiency

          This should improve the performance of HBase significantly. We may also be able to use the new API from within SequenceFile and other compressible file formats to avoid two copies from the read path.

          Similar applies to the write path, but in my experience the write path is less often CPU-constrained, so I'd prefer to concentrate on the read path first.

          Show
          Todd Lipcon added a comment - In current versions of Hadoop, the read path for applications like HBase often looks like: allocate a byte array for an HFile block (~64kb) call read() into that byte array: copy 1: read() packets from the socket into a direct buffer provided by the DirectBufferPool copy 2: copy from the direct buffer pool into the provided byte[] call setInput on a decompressor copy 3: copy from the byte[] back to a direct buffer inside the codec implementation call decompress: JNI code accesses the input buffer and writes to the output buffer copy 4: from the output buffer back into the byte[] for the uncompressed hfile block ineffiency: HBase now does its own checksumming. Since it has to checksum the byte[], it can't easily use the SSE-enabled checksum path. Given the new direct-buffer read support introduced by HDFS-2834 , we can remove copy #2 and #3 allocate a DirectBuffer for the compressed hfile block, and one for the uncompressed block (we know the size from the hfile block header) call read() into the direct buffer using the HDFS-2834 API copy 1: read() packets from the socket into that buffer call setInput() with that buffer. no copies necessary call decompress: JNI code accesses the input buffer and writes directly to the output buffer, with no copies HBase now has the uncompressed block as a direct buffer. It can use the SSE-enabled checksum for better efficiency This should improve the performance of HBase significantly. We may also be able to use the new API from within SequenceFile and other compressible file formats to avoid two copies from the read path. Similar applies to the write path, but in my experience the write path is less often CPU-constrained, so I'd prefer to concentrate on the read path first.
          Todd Lipcon made changes -
          Field Original Value New Value
          Component/s performance [ 12316502 ]
          Hide
          Todd Lipcon added a comment -

          Another application here is doing our own compression of RPCs and DFS read/write path data. HBASE-5355 is working on RPC compression, but right now I think the copies make it a bit too expensive for general purpose usage.

          Show
          Todd Lipcon added a comment - Another application here is doing our own compression of RPCs and DFS read/write path data. HBASE-5355 is working on RPC compression, but right now I think the copies make it a bit too expensive for general purpose usage.
          Hide
          Tim Broberg added a comment -

          Please see HADOOP-8148 for an attempt at this for compressors and decompressors. Compression streams could still use a similar treatment.

          Show
          Tim Broberg added a comment - Please see HADOOP-8148 for an attempt at this for compressors and decompressors. Compression streams could still use a similar treatment.
          Todd Lipcon made changes -
          Link This issue relates to HADOOP-8148 [ HADOOP-8148 ]
          Hide
          Todd Lipcon added a comment -

          Ah, thanks, sorry I missed that. Do you think this JIRA should just be marked as duplicate? I can reproduce the comments into the other one.

          Show
          Todd Lipcon added a comment - Ah, thanks, sorry I missed that. Do you think this JIRA should just be marked as duplicate? I can reproduce the comments into the other one.
          Hide
          Tim Broberg added a comment -

          It's certainly trying to address the same thing, but then I haven't addressed the (arguably more important) stream layer at all yet.

          I'm not proud, feel free to kill whichever one you think is the weakling, but do please review the proposed interface. I'm actually writing some code to it now, so any weaknesses you can find in the interface sooner rather than later would be appreciated.

          Show
          Tim Broberg added a comment - It's certainly trying to address the same thing, but then I haven't addressed the (arguably more important) stream layer at all yet. I'm not proud, feel free to kill whichever one you think is the weakling, but do please review the proposed interface. I'm actually writing some code to it now, so any weaknesses you can find in the interface sooner rather than later would be appreciated.
          Hide
          Todd Lipcon added a comment -

          Resolving as dup - please see HADOOP-8148.

          Show
          Todd Lipcon added a comment - Resolving as dup - please see HADOOP-8148 .
          Todd Lipcon made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development