Hadoop Common
  1. Hadoop Common
  2. HADOOP-8148

Zero-copy ByteBuffer-based compressor / decompressor API

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: io, performance
    • Labels:
      None

      Description

      Per Todd Lipcon's comment in HDFS-2834, "
      Whenever a native decompression codec is being used, ... we generally have the following copies:

      1) Socket -> DirectByteBuffer (in SocketChannel implementation)
      2) DirectByteBuffer -> byte[] (in SocketInputStream)
      3) byte[] -> Native buffer (set up for decompression)
      4*) decompression to a different native buffer (not really a copy - decompression necessarily rewrites)
      5) native buffer -> byte[]

      with the proposed improvement we can hopefully eliminate #2,#3 for all applications, and #2,#3,and #5 for libhdfs.
      "

      The interfaces in the attached patch attempt to address:
      A - Compression and decompression based on ByteBuffers (HDFS-2834)
      B - Zero-copy compression and decompression (HDFS-3051)
      C - Provide the caller a way to know how the max space required to hold compressed output.

      1. zerocopyifc.tgz
        2 kB
        Tim Broberg
      2. hadoop-8148.patch
        6 kB
        Owen O'Malley
      3. hadoop8148.patch
        8 kB
        Tim Broberg

        Issue Links

          Activity

          Hide
          Tim Broberg added a comment -

          Proposed ZeroCopyCompressor and ZeroCopyDecompressor interfaces for review.

          Show
          Tim Broberg added a comment - Proposed ZeroCopyCompressor and ZeroCopyDecompressor interfaces for review.
          Hide
          Tim Broberg added a comment -

          Proposed interfaces ZeroCopyCompressor, ZeroCopyDecompressor.

          Show
          Tim Broberg added a comment - Proposed interfaces ZeroCopyCompressor, ZeroCopyDecompressor.
          Hide
          Tim Broberg added a comment -

          Here are my thoughts from the distance of a month:
          1 - Question: Do we define this as a new interface, or revise the existing one?

          IMO, there is probably too much code to switch over all at once.

          2 - We also need to do something with the Compression(Input/Output)Stream.

          2a - This would seem to be just an addition of ByteBufferReadable interface to the existing stream. Call it ZeroCopyCompressionInputStream? Too long?

          2b - For this, I'm looking for the most common consumer of this code as well as the input stream interface. I'm thinking it's LineReader. The input stream passed to a compression stream would be read buffer to buffer without modification to LineReader. LineReader could take advantage of ZeroCopyCompressionInputStream by passing a direct buffer, or just use it as is and require a copy into his byte array.

          2c - For symmetry, any reason not to define a ByteBufferWriteable interface and make the corresponding output stream class?

          3 - The List<ByteBuffer> interface of HDFS-3051 doesn't seem to be taking off, and it complicates the native code.

          Kill it?

          Todd is suggesting adapting a codec, Snappy seems a likely candidate, to the new interface.

          I'll wait a week or so for any dust to settle and then generate a patch to trunk's Snappy codec for consideration.

          Show
          Tim Broberg added a comment - Here are my thoughts from the distance of a month: 1 - Question: Do we define this as a new interface, or revise the existing one? IMO, there is probably too much code to switch over all at once. 2 - We also need to do something with the Compression(Input/Output)Stream. 2a - This would seem to be just an addition of ByteBufferReadable interface to the existing stream. Call it ZeroCopyCompressionInputStream? Too long? 2b - For this, I'm looking for the most common consumer of this code as well as the input stream interface. I'm thinking it's LineReader. The input stream passed to a compression stream would be read buffer to buffer without modification to LineReader. LineReader could take advantage of ZeroCopyCompressionInputStream by passing a direct buffer, or just use it as is and require a copy into his byte array. 2c - For symmetry, any reason not to define a ByteBufferWriteable interface and make the corresponding output stream class? 3 - The List<ByteBuffer> interface of HDFS-3051 doesn't seem to be taking off, and it complicates the native code. Kill it? Todd is suggesting adapting a codec, Snappy seems a likely candidate, to the new interface. I'll wait a week or so for any dust to settle and then generate a patch to trunk's Snappy codec for consideration.
          Hide
          Tim Broberg added a comment -

          I've been thinking about this, and here's some dust: this interface seems to work well enough for software codecs, but for multicore hardware codecs, it is necessary to process multiple records in parallel. To do this, the decompression needs to start reading and decompressing data before the caller shows up with his direct buffer.

          This suggests that the stream should return a prefilled ByteBuffer instead of filling one provided by the caller.

          ...but then the caller needs to have an efficient way to recycle it as direct buffers are (supposedly) costly to build. So now we need to add a call to release the buffer.

          ByteBuffer read();
          void ReleaseBuffer(ByteBuffer);

          This has one added benefit that the compression stream now has control over the sizes of all buffers so there is no problem keeping the source and destination sizes appropriate for each other.

          On the output side we have

          void write(ByteBuffer);
          ByteBuffer GetBuffer();

          This saves a copy for HW / multithreaded codecs, which are especially sensitive to copies.

          So, is this interface too much complexity / difference from the ByteBufferReadable interface to make upcoming fast compressors faster?

          Show
          Tim Broberg added a comment - I've been thinking about this, and here's some dust: this interface seems to work well enough for software codecs, but for multicore hardware codecs, it is necessary to process multiple records in parallel. To do this, the decompression needs to start reading and decompressing data before the caller shows up with his direct buffer. This suggests that the stream should return a prefilled ByteBuffer instead of filling one provided by the caller. ...but then the caller needs to have an efficient way to recycle it as direct buffers are (supposedly) costly to build. So now we need to add a call to release the buffer. ByteBuffer read(); void ReleaseBuffer(ByteBuffer); This has one added benefit that the compression stream now has control over the sizes of all buffers so there is no problem keeping the source and destination sizes appropriate for each other. On the output side we have void write(ByteBuffer); ByteBuffer GetBuffer(); This saves a copy for HW / multithreaded codecs, which are especially sensitive to copies. So, is this interface too much complexity / difference from the ByteBufferReadable interface to make upcoming fast compressors faster?
          Hide
          Todd Lipcon added a comment -

          I've been thinking about this, and here's some dust: this interface seems to work well enough for software codecs, but for multicore hardware codecs, it is necessary to process multiple records in parallel. To do this, the decompression needs to start reading and decompressing data before the caller shows up with his direct buffer.

          Sorry for my ignorance in this area, but: this implies that hardware codecs are pipelined? In the LineReader use case, you're saying you would provide a much larger block to the codec ahead of what is being read out of the decompression side?

          Show
          Todd Lipcon added a comment - I've been thinking about this, and here's some dust: this interface seems to work well enough for software codecs, but for multicore hardware codecs, it is necessary to process multiple records in parallel. To do this, the decompression needs to start reading and decompressing data before the caller shows up with his direct buffer. Sorry for my ignorance in this area, but: this implies that hardware codecs are pipelined? In the LineReader use case, you're saying you would provide a much larger block to the codec ahead of what is being read out of the decompression side?
          Hide
          Todd Lipcon added a comment -

          Duplicating my comment from HADOOP-8258:

          In current versions of Hadoop, the read path for applications like HBase often looks like:

          allocate a byte array for an HFile block (~64kb)
          call read() into that byte array:
          copy 1: read() packets from the socket into a direct buffer provided by the DirectBufferPool
          copy 2: copy from the direct buffer pool into the provided byte[]
          call setInput on a decompressor
          copy 3: copy from the byte[] back to a direct buffer inside the codec implementation
          call decompress:
          JNI code accesses the input buffer and writes to the output buffer
          copy 4: from the output buffer back into the byte[] for the uncompressed hfile block
          ineffiency: HBase now does its own checksumming. Since it has to checksum the byte[], it can't easily use the SSE-enabled checksum path.
          Given the new direct-buffer read support introduced by HDFS-2834, we can remove copy #2 and #3

          allocate a DirectBuffer for the compressed hfile block, and one for the uncompressed block (we know the size from the hfile block header)
          call read() into the direct buffer using the HDFS-2834 API
          copy 1: read() packets from the socket into that buffer
          call setInput() with that buffer. no copies necessary
          call decompress:
          JNI code accesses the input buffer and writes directly to the output buffer, with no copies
          HBase now has the uncompressed block as a direct buffer. It can use the SSE-enabled checksum for better efficiency
          This should improve the performance of HBase significantly. We may also be able to use the new API from within SequenceFile and other compressible file formats to avoid two copies from the read path.

          Similar applies to the write path, but in my experience the write path is less often CPU-constrained, so I'd prefer to concentrate on the read path first.

          Show
          Todd Lipcon added a comment - Duplicating my comment from HADOOP-8258 : In current versions of Hadoop, the read path for applications like HBase often looks like: allocate a byte array for an HFile block (~64kb) call read() into that byte array: copy 1: read() packets from the socket into a direct buffer provided by the DirectBufferPool copy 2: copy from the direct buffer pool into the provided byte[] call setInput on a decompressor copy 3: copy from the byte[] back to a direct buffer inside the codec implementation call decompress: JNI code accesses the input buffer and writes to the output buffer copy 4: from the output buffer back into the byte[] for the uncompressed hfile block ineffiency: HBase now does its own checksumming. Since it has to checksum the byte[], it can't easily use the SSE-enabled checksum path. Given the new direct-buffer read support introduced by HDFS-2834 , we can remove copy #2 and #3 allocate a DirectBuffer for the compressed hfile block, and one for the uncompressed block (we know the size from the hfile block header) call read() into the direct buffer using the HDFS-2834 API copy 1: read() packets from the socket into that buffer call setInput() with that buffer. no copies necessary call decompress: JNI code accesses the input buffer and writes directly to the output buffer, with no copies HBase now has the uncompressed block as a direct buffer. It can use the SSE-enabled checksum for better efficiency This should improve the performance of HBase significantly. We may also be able to use the new API from within SequenceFile and other compressible file formats to avoid two copies from the read path. Similar applies to the write path, but in my experience the write path is less often CPU-constrained, so I'd prefer to concentrate on the read path first.
          Hide
          Tim Broberg added a comment -

          Sorry for my ignorance in this area, but: this implies that hardware codecs are pipelined? In the LineReader use case, you're saying you would provide a much larger block to the codec ahead of what is being read out of the decompression side?

          Yes, command/result handling, DMA transfer, and processing happen in parallel, which is crucial for small-packet performance. Given what I expect will be 32kB - 128kB blocks here, this isn't as huge an issue, but it's still non-trivial. The important pipelining, will be performing stream io in parallel with (de)compression.

          In addition to pipelining, there is parallelism. Non-low-end processors are multicore such that we will want to read several blocks ahead and process them in parallel. (One could also perform multithreaded software (de)compression in much the same fashion, but I have no plans to implement it.)

          So, yes, the HW version of the CompressionInputStream would have a separate thread which preemptively reads records in and drops them in an input queue, plugging when that queue is full. HW processes blocks from the input queue dumping results in an output queue which feeds the read() call.

          So, the HW is dumping stuff into buffers before the read() gets a chance to provide one.

          Show
          Tim Broberg added a comment - Sorry for my ignorance in this area, but: this implies that hardware codecs are pipelined? In the LineReader use case, you're saying you would provide a much larger block to the codec ahead of what is being read out of the decompression side? Yes, command/result handling, DMA transfer, and processing happen in parallel, which is crucial for small-packet performance. Given what I expect will be 32kB - 128kB blocks here, this isn't as huge an issue, but it's still non-trivial. The important pipelining, will be performing stream io in parallel with (de)compression. In addition to pipelining, there is parallelism. Non-low-end processors are multicore such that we will want to read several blocks ahead and process them in parallel. (One could also perform multithreaded software (de)compression in much the same fashion, but I have no plans to implement it.) So, yes, the HW version of the CompressionInputStream would have a separate thread which preemptively reads records in and drops them in an input queue, plugging when that queue is full. HW processes blocks from the input queue dumping results in an output queue which feeds the read() call. So, the HW is dumping stuff into buffers before the read() gets a chance to provide one.
          Hide
          Tim Broberg added a comment -

          More dust:
          1 - block-based non-scatter-gather libraries (basically everything software except gzip) won't readily support the scatter-gather List<ByteBuffer> interface. I think we should dump it and just pass ByteBuffer's.
          2 - Direct buffers have a reputation for being costly to create. As I understand it, the reason the codec pool class exists is to allow compressors with direct buffers to be reused without having to create a new direct buffer each time a record is read. The interface proposed does not address ownership or recycling of the buffers. We could add calls to each interface that passes these buffers to manage the buffers, or the buffers themselves could have a call to return them to a pool from which they can be reused. Managing the number of elements in the pool and the size of the buffers is a nontrivial task.
          3 - If we do address buffer recycling, the codec pool approach would appear to be obsolete. Note that, outside of compression streams, codec pool is the only customer that cares about the compression interface any longer - an extreme statement, but witness that bzip doesn't implement a compressor interface at all except for dummy stubs to show to codec pool.
          4 - The interface of the existing compressor / decompressor classes pack a lot of baggage from the gzip interface that decouples the input from the output for a streaming compressor class. setInput, needsInput, finished, finish, reset, and reinit all manage state between the input and output where a simple compress(ByteBuffer src, ByteBuffer dst) could replace the existing call and all the rest. (Full disclosure, I want all those other calls dead personally because all that state makes asynchronous compression a nightmare.)

          So, I'm highly tempted to sweep away the compressor interface and replace it with a much simpler one -

          • compress(src, dst) to process data
          • finish() to allow cleaning up open streams
          • getBytesRead(), getBytesWritten() for statistics

          Replace the codec pool with a pool of buffers extending ByteBuffer which have a callback method to recycle them.

          Too radical? What would be a better way to solve the problems? Any problems this doesn't solve?

          Show
          Tim Broberg added a comment - More dust: 1 - block-based non-scatter-gather libraries (basically everything software except gzip) won't readily support the scatter-gather List<ByteBuffer> interface. I think we should dump it and just pass ByteBuffer's. 2 - Direct buffers have a reputation for being costly to create. As I understand it, the reason the codec pool class exists is to allow compressors with direct buffers to be reused without having to create a new direct buffer each time a record is read. The interface proposed does not address ownership or recycling of the buffers. We could add calls to each interface that passes these buffers to manage the buffers, or the buffers themselves could have a call to return them to a pool from which they can be reused. Managing the number of elements in the pool and the size of the buffers is a nontrivial task. 3 - If we do address buffer recycling, the codec pool approach would appear to be obsolete. Note that, outside of compression streams, codec pool is the only customer that cares about the compression interface any longer - an extreme statement, but witness that bzip doesn't implement a compressor interface at all except for dummy stubs to show to codec pool. 4 - The interface of the existing compressor / decompressor classes pack a lot of baggage from the gzip interface that decouples the input from the output for a streaming compressor class. setInput, needsInput, finished, finish, reset, and reinit all manage state between the input and output where a simple compress(ByteBuffer src, ByteBuffer dst) could replace the existing call and all the rest. (Full disclosure, I want all those other calls dead personally because all that state makes asynchronous compression a nightmare.) So, I'm highly tempted to sweep away the compressor interface and replace it with a much simpler one - compress(src, dst) to process data finish() to allow cleaning up open streams getBytesRead(), getBytesWritten() for statistics Replace the codec pool with a pool of buffers extending ByteBuffer which have a callback method to recycle them. Too radical? What would be a better way to solve the problems? Any problems this doesn't solve?
          Hide
          Tim Broberg added a comment -

          Considering my previous comment of 11/Apr/12 23:58 suggesting that the read() function should return a buffer rather than filling a buffer provided by the caller.

          This means that the buffers are owned by the stream layer. The definition suggested also implies that the stream layer picks the buffer size, which can be good as the stream layer knows what buffer sizes are appropriate for the compression algorithms in question.

          Is that ok?

          Show
          Tim Broberg added a comment - Considering my previous comment of 11/Apr/12 23:58 suggesting that the read() function should return a buffer rather than filling a buffer provided by the caller. This means that the buffers are owned by the stream layer. The definition suggested also implies that the stream layer picks the buffer size, which can be good as the stream layer knows what buffer sizes are appropriate for the compression algorithms in question. Is that ok?
          Hide
          Owen O'Malley added a comment -

          Sorry for coming into this late.

          I've been working with the compression codecs recently and I have several related observations:
          1. No one seems to use the compressors/decompressors directly. They always use the streams.
          2. The current interface is difficult to implement efficiently. To avoid copies, I always end up implementing the streams directly rather than use a compressor.
          3. As with most of this kind of code, the pure java version of them is much less hassle and more performant than a jni version.
          4. There aren't that many users out there, but the users include all of the important file formats (SequenceFile, TFile, HFile, and RCFile) and the MapReduce framework. (That isn't to say that we can delete the old interfaces, but they aren't user facing to the same level as FileSystem, Mapper, and Reducer.)

          My inclination is that extending Compressor/Decompressor is a mistake. On the other hand, making a sub-class of Codec seems like a good idea so that we can make Codecs that implement both the new and old interfaces.

          Thoughts?

          Show
          Owen O'Malley added a comment - Sorry for coming into this late. I've been working with the compression codecs recently and I have several related observations: 1. No one seems to use the compressors/decompressors directly. They always use the streams. 2. The current interface is difficult to implement efficiently. To avoid copies, I always end up implementing the streams directly rather than use a compressor. 3. As with most of this kind of code, the pure java version of them is much less hassle and more performant than a jni version. 4. There aren't that many users out there, but the users include all of the important file formats (SequenceFile, TFile, HFile, and RCFile) and the MapReduce framework. (That isn't to say that we can delete the old interfaces, but they aren't user facing to the same level as FileSystem, Mapper, and Reducer.) My inclination is that extending Compressor/Decompressor is a mistake. On the other hand, making a sub-class of Codec seems like a good idea so that we can make Codecs that implement both the new and old interfaces. Thoughts?
          Hide
          Tim Broberg added a comment -

          Great to have some discussion on this! I was afraid it would be silence until the cement hardens followed by shouting that it's all wrong.

          Thoughts:

          1 - Agreed. The bzip codec is a good example of this approach. The only real user of the compressors apart from streams is the codec pool, which seems like kind of a hack to me. If we want to pool the direct buffers, why don't we make direct buffer pools instead of compressor / decompressor pools?
          2 - Agreed. It feels very gzippy, and is unfriendly to block-based compressors.
          3 - Less hassle, heck yes. More performant? That's surprising to me. Is that because of the copies? Got benchmarks?
          4 - Agreed.

          I will attach my current idea of what the stream interface should look like. In this model, the stream owns the direct memory buffers and pools them to reduce allocation overhead. This allows the decompress stream to read ahead so that buffers are available to read instantly.

          Show
          Tim Broberg added a comment - Great to have some discussion on this! I was afraid it would be silence until the cement hardens followed by shouting that it's all wrong. Thoughts: 1 - Agreed. The bzip codec is a good example of this approach. The only real user of the compressors apart from streams is the codec pool, which seems like kind of a hack to me. If we want to pool the direct buffers, why don't we make direct buffer pools instead of compressor / decompressor pools? 2 - Agreed. It feels very gzippy, and is unfriendly to block-based compressors. 3 - Less hassle, heck yes. More performant? That's surprising to me. Is that because of the copies? Got benchmarks? 4 - Agreed. I will attach my current idea of what the stream interface should look like. In this model, the stream owns the direct memory buffers and pools them to reduce allocation overhead. This allows the decompress stream to read ahead so that buffers are available to read instantly.
          Hide
          Tim Broberg added a comment -

          ZeroCopy compressor / stream interfaces

          Show
          Tim Broberg added a comment - ZeroCopy compressor / stream interfaces
          Hide
          Owen O'Malley added a comment -

          JNI is very problematic in several dimensions, but in terms of performance take a look at https://issues.apache.org/jira/browse/HADOOP-6148 . Also look at the jiras that led to the codec pool and reimplementing the default codec.

          Show
          Owen O'Malley added a comment - JNI is very problematic in several dimensions, but in terms of performance take a look at https://issues.apache.org/jira/browse/HADOOP-6148 . Also look at the jiras that led to the codec pool and reimplementing the default codec.
          Hide
          Todd Lipcon added a comment -

          JNI is very problematic in several dimensions, but in terms of performance take a look at https://issues.apache.org/jira/browse/HADOOP-6148

          Sure, the pure Java approach was better than the crappy built-in JNI CRC. But then we switched again back to JNI: HDFS-2080. With a good implementation based on direct buffers, it's way faster to go to C for most of this stuff, since you can take advantage of SSE.

          Show
          Todd Lipcon added a comment - JNI is very problematic in several dimensions, but in terms of performance take a look at https://issues.apache.org/jira/browse/HADOOP-6148 Sure, the pure Java approach was better than the crappy built-in JNI CRC. But then we switched again back to JNI: HDFS-2080 . With a good implementation based on direct buffers, it's way faster to go to C for most of this stuff, since you can take advantage of SSE.
          Hide
          Owen O'Malley added a comment -

          Maybe. I've seen a several cases where micro-benchmarks show jni is great, but under load the performance craps out.

          Show
          Owen O'Malley added a comment - Maybe. I've seen a several cases where micro-benchmarks show jni is great, but under load the performance craps out.
          Hide
          Owen O'Malley added a comment -

          Tim, your tarball doesn't have any contents.

          I messed around with the api some yesterday and I think this would work. I combined the inputstream and readablebytechannel and the outputstream and writablebytechannel.

          We would also need to update HDFS to use channels rather than the custom api for byte buffers.

          Show
          Owen O'Malley added a comment - Tim, your tarball doesn't have any contents. I messed around with the api some yesterday and I think this would work. I combined the inputstream and readablebytechannel and the outputstream and writablebytechannel. We would also need to update HDFS to use channels rather than the custom api for byte buffers.
          Hide
          Tim Broberg added a comment -

          2nd try - proposed zero copy compression interfaces

          Show
          Tim Broberg added a comment - 2nd try - proposed zero copy compression interfaces
          Hide
          Tim Broberg added a comment -

          Ok, Owen. Sorry not to get back sooner. Last week was crazy.

          Non-empty tarball with my proposal is attached.

          Comments on yours are below, keeping in mind that I'm somewhat of a Java neophyte. If I say something stupid, please try to be gentle:
          1 - By declaring an abstract stream class in the codec, you are forcing implementations to extend those classes, which means they can't reuse any of the existing compression stream code. What benefits do you see coming from this that warrant the tradeoff and/or what am I missing?
          2 - What are the benefits of creating a separate codec / factory class vs just returning a compression stream and checking whether it implements ZeroCopy<whatever>?
          3 - The read(ByteBuffer) definition just fundamentally doesn't work for my particular use case where there is a significant latency on reads. If the caller provides the ByteBuffer, then the read() function has to wait for the latency of the entire operation or else read() has to perform a copy from his own pool of completed buffers, which defeats the whole point of the exercise. This is why I felt I needed "ByteBuffer readBuffer()" in my interface.
          4 - Am I seeing compress and decompress functions at the codec level? I guess you're trying to hide the complex setInput / compress interface of the existing compressor? I guess you're saying we want the compressors to still be available in non-stream form, but you want to clean up the interface?
          5 - What about the codec pool? Do you see that disappearing?

          You mentioned channels. One alternative approach would be to use the new Java 7 Asynchronous channels, but making compression require Java 7 would have some pretty broad implications that I don't expect we're ready to deal with. Also, this makes the caller deal with the asynchronicity instead of having the stream just read ahead and return buffers on command.

          Ideally, it would be great to see an interface that doesn't require the caller to deal with pooling the buffers below his level in the stream, but I haven't found a way to do this that doesn't require copies for streams that implement any kind of read ahead.

          I'm sure at present I'm firmly in the minority in caring about multithreaded decompressors. I've tried to keep the overhead innocuous such that it's worth the trouble. Any ideas you have on addressing the problem more efficiently would be most welcome, but read(ByteBuffer) doesn't do it.

          Show
          Tim Broberg added a comment - Ok, Owen. Sorry not to get back sooner. Last week was crazy. Non-empty tarball with my proposal is attached. Comments on yours are below, keeping in mind that I'm somewhat of a Java neophyte. If I say something stupid, please try to be gentle: 1 - By declaring an abstract stream class in the codec, you are forcing implementations to extend those classes, which means they can't reuse any of the existing compression stream code. What benefits do you see coming from this that warrant the tradeoff and/or what am I missing? 2 - What are the benefits of creating a separate codec / factory class vs just returning a compression stream and checking whether it implements ZeroCopy<whatever>? 3 - The read(ByteBuffer) definition just fundamentally doesn't work for my particular use case where there is a significant latency on reads. If the caller provides the ByteBuffer, then the read() function has to wait for the latency of the entire operation or else read() has to perform a copy from his own pool of completed buffers, which defeats the whole point of the exercise. This is why I felt I needed "ByteBuffer readBuffer()" in my interface. 4 - Am I seeing compress and decompress functions at the codec level? I guess you're trying to hide the complex setInput / compress interface of the existing compressor? I guess you're saying we want the compressors to still be available in non-stream form, but you want to clean up the interface? 5 - What about the codec pool? Do you see that disappearing? You mentioned channels. One alternative approach would be to use the new Java 7 Asynchronous channels, but making compression require Java 7 would have some pretty broad implications that I don't expect we're ready to deal with. Also, this makes the caller deal with the asynchronicity instead of having the stream just read ahead and return buffers on command. Ideally, it would be great to see an interface that doesn't require the caller to deal with pooling the buffers below his level in the stream, but I haven't found a way to do this that doesn't require copies for streams that implement any kind of read ahead. I'm sure at present I'm firmly in the minority in caring about multithreaded decompressors. I've tried to keep the overhead innocuous such that it's worth the trouble. Any ideas you have on addressing the problem more efficiently would be most welcome, but read(ByteBuffer) doesn't do it.
          Hide
          Tim Broberg added a comment -

          One more thought here, we could define an asynchronous decompressor interface with a "Future<ByteBuffer> read(ByteBuffer dest)" method for pipelined decompressor streams.

          This would allow the app to own the buffers at the expense of making him track multiple outstanding requests.

          Likewise, there could be a "Future<ByteBuffer> write(ByteBuffer source)" method on compression streams where the Future<> would return the buffer to the app for reuse.

          The caller would be responsible to ensure that the buffers are large enough also.

          It's cute, but I don't know that I like it any better.

          Show
          Tim Broberg added a comment - One more thought here, we could define an asynchronous decompressor interface with a "Future<ByteBuffer> read(ByteBuffer dest)" method for pipelined decompressor streams. This would allow the app to own the buffers at the expense of making him track multiple outstanding requests. Likewise, there could be a "Future<ByteBuffer> write(ByteBuffer source)" method on compression streams where the Future<> would return the buffer to the app for reuse. The caller would be responsible to ensure that the buffers are large enough also. It's cute, but I don't know that I like it any better.
          Hide
          Tim Broberg added a comment -

          Regrettably, Exar has closed the San Diego office before I could complete this task. My replacement will continue my efforts, but will likely have his hands full coming up to speed. I have implemented the patch for our own codec, but we will not be providing a Snappy version any time soon.

          Apologies for leaving the work half done,

          • Tim.
          Show
          Tim Broberg added a comment - Regrettably, Exar has closed the San Diego office before I could complete this task. My replacement will continue my efforts, but will likely have his hands full coming up to speed. I have implemented the patch for our own codec, but we will not be providing a Snappy version any time soon. Apologies for leaving the work half done, Tim.

            People

            • Assignee:
              Tim Broberg
              Reporter:
              Tim Broberg
            • Votes:
              0 Vote for this issue
              Watchers:
              31 Start watching this issue

              Dates

              • Created:
                Updated:

                Development