V2 attached based on the review + some test related change.
Would prefer to have CDIS just implement IS, and let callers wrap in DIS when desired, similar to how we use SnappyInputStream in IncomingTcpConnection, or LZFInputStream in ISR
CDIS now implements InputStream only and renamed to CompressedInputStream.
Why the changes to OutboundTcpConnection?
The changes are made in order to obtain nio.SocketChannel, socket has to be created using SocketChannel.open.
Re MS changes: when would header.file be null?
When a node requests range but target node doesn't have corresponding data. I reverted the change in MS to send at least send streaming header when header.file is null. It seems redundant but for now, it's necessary to terminate stream session of requesting node.
Chunk sort can use Guava Longs.compare
I suggest adding a comment to explain why sort is necessary (b/c ranges are from replication strategy, so may not be sorted?) Instead of using Set + copy into array, why not use an ArrayList + trimToSize()
The reason why I use Set here is to eliminate duplicate chunks. Given two different file section can be mapped to just one chunk.
is the FST comment // TODO just use a raw RandomAccessFile since we're managing our own buffer here obsolete? is the CompressedRandomAccessReader path used at all in FST anymore?
I removed CRAR from FST in v2. Even if nio is not available (in case of inter-node SSL), streaming uses CompressedFileStreamTask with socket's InputStream to transfer file directly.
Nit: avoid double negation in if statements with else clauses
Nit: suggest moving serialization code for Chunk and CompressionParameters into ChunkSerializer and ChunkParametersSerializer classes, respectively, just to make the code discoverable for re-use later
Should we make nio transfer the default for uncompressed sstables as well, and add an option to enable compression? Alternatively, now that compression is the default for new sstables, I'd be okay with removing LZF stream compression entirely
I don't do any benchmark, but I think always using LZF compression is fine when transferring uncompressed data.
Does this over-transfer data on chunk boundaries? Put another way, do we stream data that doesn't actually belong on the target node? (I'm okay with this, just want to be clear about what's happening.)
Source node can send unrelated range of data inside chunk, but receiving node ignores (or skips) that part when reading from socket, so, the answer is no.