I agree with 2.1 (positional read and non-positional can run concurrently), and 2.2 (Two or more positional reads can run concurrently.)
2.3 seems both too strict and too loose at the same time, if that makes any sense. Too strict, because it talks about some internal details of HDFS (a file's length will not change if lastBlock is complete). When using a POSIX filesystem like Ceph or Lustre, a file's length can change at any time. We should try to accommodate the existence of those systems, even though we don't plan on adding random write support to the HCFS. Others may modify the files we're reading from outside of Hadoop. This problem exists with LocalFileSystem as well, of course.
2.3 is too loose because it doesn't specify HOW getFileLength interacts with read, pread, and other calls. If we are using the new HDFS-6633 feature (HDFS tail) and new data is coming in, does getFileLength return that new length all the time? Or does it keep returning the old length? Can getFileLength run concurrently with any other functions?
I would argue that getFileLength should be able to run concurrently with read and pread. I would also argue that it should be allowed to change over time, and even get shorter. (Of course it will never get shorter in the specific case of HDFS, but for LocalFileSystem... it can.) For HDFS, getFileLength should be able to return the last known file length without blocking or waiting for anything-- i.e. check an AtomicLong or take a mutex on something smaller than the whole stream.
Also, it would be nice to add a section specifying that when we do two non-positional reads at the same time, they may wait for each other to complete before proceeding. And getPos, seek, and skip may wait for non-positional reads to complete before running.
Basically, what this looks like is grouping the functions into two sets:
Group P: read, getPos, seek, skip, zero-copy read, releaseBuffer
Group N: pread, getFileLength, setReadahead, setDropBehind, getReadStatistics
Functions in group P can all block each other (probably they grab the same mutex, although this isn't guaranteed).
Functions in group N do not ever block each other or functions in group P for a long time (although they may take a mutex or two for a very short amount of time, it's not the same mutex as for group P, and they don't hang on to it while doing I/O.)