Type: New Feature
Affects Version/s: 0.6.0
Fix Version/s: 0.7.0
HDFS Input streams should support positional read. Positional read (such as the pread syscall on linux) allows reading for a specified offset without affecting the current file offset. Since the underlying file state is not touched, pread can be used efficiently in multi-threaded programs.
Here is how I plan to implement it.
Provide PositionedReadable interface, with the following methods:
int read(long position, byte buffer, int offset, int length);
void readFully(long position, byte buffer, int offset, int length);
void readFully(long position, byte buffer);
Abstract class FSInputStream would provide default implementation of the above methods using getPos(), seek() and read() methods. The default implementation is inefficient in multi-threaded programs since it locks the object while seeking, reading, and restoring to old state.
DFSClient.DFSInputStream, which extends FSInputStream will provide an efficient non-synchronized implementation for above calls.
In addition, FSDataInputStream, which is a wrapper around FSInputStream, will provide wrapper methods for above read methods as well.
Patch forthcoming early next week.