Hadoop Common
  1. Hadoop Common
  2. HADOOP-1470

Rework FSInputChecker and FSOutputSummer to support checksum code sharing between ChecksumFileSystem and block level crc dfs

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.3
    • Fix Version/s: 0.14.0
    • Component/s: fs
    • Labels:
      None

      Description

      Comment from Doug in HADOOP-1134:
      I'd prefer it if the CRC code could be shared with CheckSumFileSystem. In particular, it seems to me that FSInputChecker and FSOutputSummer could be extended to support pluggable sources and sinks for checksums, respectively, and DFSDataInputStream and DFSDataOutputStream could use these. Advantages of this are: (a) single implementation of checksum logic to debug and maintain; (b) keeps checksumming as close to possible to data generation and use. This patch computes checksums after data has been buffered, and validates them before it is buffered. We sometimes use large buffers and would like to guard against in-memory errors. The current checksum code catches a lot of such errors. So we should compute checksums after minimal buffering (just bytesPerChecksum, ideally) and validate them at the last possible moment (e.g., through the use of a small final buffer with a larger buffer behind it). I do not think this will significantly affect performance, and data integrity is a high priority.

      1. genericChecksum.patch
        7 kB
        Hairong Kuang
      2. GenericChecksum.patch
        64 kB
        Hairong Kuang
      3. GenericChecksum1.patch
        63 kB
        Hairong Kuang
      4. GenericChecksum2.patch
        64 kB
        Hairong Kuang
      5. GenericChecksum3.patch
        69 kB
        Hairong Kuang
      6. GenericChecksum4.patch
        69 kB
        Hairong Kuang
      7. InputChecker-01.java
        3 kB
        Raghu Angadi

        Issue Links

          Activity

          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-Nightly #152 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/152/ )
          Hide
          Doug Cutting added a comment -

          I just committed this. Thanks, Hairong!

          Show
          Doug Cutting added a comment - I just committed this. Thanks, Hairong!
          Hide
          Raghu Angadi added a comment -

          minor: Can we change the name of 'static protected int readFully()'? It is not a traditional 'readFully() that returns void and throws an IOExcpection when it can not read requested number of bytes. The one in FSInputChecker is more of 'read as much as possible' and returns an int. It might be better to rename to confusion with conventional readFully().

          Show
          Raghu Angadi added a comment - minor: Can we change the name of 'static protected int readFully()'? It is not a traditional 'readFully() that returns void and throws an IOExcpection when it can not read requested number of bytes. The one in FSInputChecker is more of 'read as much as possible' and returns an int. It might be better to rename to confusion with conventional readFully().
          Hide
          Raghu Angadi added a comment -

          Please ignore the last comment. It was a misread of the code.

          Show
          Raghu Angadi added a comment - Please ignore the last comment. It was a misread of the code.
          Hide
          Raghu Angadi added a comment -

          I am trying to use OutputSummer. FSOutputSummer seems to be wring all of user's buffer as one chunk. It needs to invoke writeChunk() with max of bytesPerChecksum, right? How are unit tests passing?

          Show
          Raghu Angadi added a comment - I am trying to use OutputSummer. FSOutputSummer seems to be wring all of user's buffer as one chunk. It needs to invoke writeChunk() with max of bytesPerChecksum, right? How are unit tests passing?
          Hide
          Raghu Angadi added a comment -

          > To be consistent with readChunk(), writeChunk() can also pass in checksum instead of writeChecksum() call.
          This is already included in the latest patch (patch #4). Thanks Hairong.

          Show
          Raghu Angadi added a comment - > To be consistent with readChunk(), writeChunk() can also pass in checksum instead of writeChecksum() call. This is already included in the latest patch (patch #4). Thanks Hairong.
          Hide
          Raghu Angadi added a comment -

          This is not necessary to be implemented in the current patch:
          To be consistent with readChunk(), writeChunk() can also pass in checksum instead of writeChecksum() call.

          Show
          Raghu Angadi added a comment - This is not necessary to be implemented in the current patch: To be consistent with readChunk(), writeChunk() can also pass in checksum instead of writeChecksum() call.
          Show
          Hadoop QA added a comment - +1 http://issues.apache.org/jira/secure/attachment/12361330/GenericChecksum4.patch applied and successfully tested against trunk revision r553623. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/371/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/371/console
          Hide
          Hairong Kuang added a comment -

          This patch addresses most of Raghu's comments.

          Show
          Hairong Kuang added a comment - This patch addresses most of Raghu's comments.
          Hide
          Raghu Angadi added a comment -

          > InputChecker probably needs a way for sub class to set chunkPos.
          This is certainly not required now. I will maintain initial offset (mainly for error checking). For me its simple, I don't allow seeks.

          Show
          Raghu Angadi added a comment - > InputChecker probably needs a way for sub class to set chunkPos. This is certainly not required now. I will maintain initial offset (mainly for error checking). For me its simple, I don't allow seeks.
          Hide
          Raghu Angadi added a comment -

          InputChecker probably needs a way for sub class to set chunkPos. If no seek() is called, it always seems to start at 0. Subclass can work around by storing offset to initial chunkPos though.

          Show
          Raghu Angadi added a comment - InputChecker probably needs a way for sub class to set chunkPos. If no seek() is called, it always seems to start at 0. Subclass can work around by storing offset to initial chunkPos though.
          Hide
          Raghu Angadi added a comment -

          Also, should public methods like read() be synchronized?

          Show
          Raghu Angadi added a comment - Also, should public methods like read() be synchronized?
          Hide
          Raghu Angadi added a comment -

          regd write: It checksum's after writeChunk(). Should we checksum as soon possible? Otherwise, this won't catch errors if writeChunk() corrupted the data. Also when bytes less than bytesPerChecksum are written, data is not checksummed until rest of the bytes are written.

          Show
          Raghu Angadi added a comment - regd write: It checksum's after writeChunk(). Should we checksum as soon possible? Otherwise, this won't catch errors if writeChunk() corrupted the data. Also when bytes less than bytesPerChecksum are written, data is not checksummed until rest of the bytes are written.
          Hide
          Raghu Angadi added a comment -

          in output summer :

              if( off<0 || off>b.length ) { 
                throw new IndexOutOfBoundsException(); 
              } 
          ... and write() writes fewer bytes.
           

          .

          I wonder if checking array bounds is really necessary since if we don't check, it is any going to result in the same exception.

          Show
          Raghu Angadi added a comment - in output summer : if ( off<0 || off>b.length ) { throw new IndexOutOfBoundsException(); } ... and write() writes fewer bytes. . I wonder if checking array bounds is really necessary since if we don't check, it is any going to result in the same exception.
          Hide
          Raghu Angadi added a comment -

          > The above succeeds only if BYTES_PER_CHECKSUM is 1..
          wrong reason. the exception is mostly due to the fact that FILE_SIZE might not be a multiple of BYTES_PER_CHECKSUM.

          Show
          Raghu Angadi added a comment - > The above succeeds only if BYTES_PER_CHECKSUM is 1.. wrong reason. the exception is mostly due to the fact that FILE_SIZE might not be a multiple of BYTES_PER_CHECKSUM.
          Hide
          Raghu Angadi added a comment -

          While testing this with my patch, I got 'ArrayOutOfBounds' exception in TestFSOutputSummer.java :

              for( int i=0; i<FILE_SIZE; i+=BYTES_PER_CHECKSUM) { 
                stm.write(expected, i, BYTES_PER_CHECKSUM); 
              } 
          

          The above succeeds only if BYTES_PER_CHECKSUM is 1.. I think it is set to 10. Not sure why this does not show up on trunk. Note that I didn't modify the test, which implies it is actually writing to RawDFS (which does not use Generic Checksum yet) instead of a ChecksumFileSystem.

          Show
          Raghu Angadi added a comment - While testing this with my patch, I got 'ArrayOutOfBounds' exception in TestFSOutputSummer.java : for ( int i=0; i<FILE_SIZE; i+=BYTES_PER_CHECKSUM) { stm.write(expected, i, BYTES_PER_CHECKSUM); } The above succeeds only if BYTES_PER_CHECKSUM is 1.. I think it is set to 10. Not sure why this does not show up on trunk. Note that I didn't modify the test, which implies it is actually writing to RawDFS (which does not use Generic Checksum yet) instead of a ChecksumFileSystem.
          Show
          Hadoop QA added a comment - +1 http://issues.apache.org/jira/secure/attachment/12361149/GenericChecksum3.patch applied and successfully tested against trunk revision r553623. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/366/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/366/console
          Hide
          Hairong Kuang added a comment -

          The patch incoporated Doug's comments.

          > RawLocalFileSystem#open() and #create() ignore the bufferSize parameter. This means that local file access will end up using only a buffer of bytesPerChecksum, which will hurt performance.

          This is not a simple fix. I create a class BufferedFSInputStream. This is a very general class, so I make it public although it is used only by RawLocalFileSystem. I adjust the implementation of FSDataInputStream & the Seekable interface so FSDataInputStream works well with a BufferedInputStream. I also make RawLocalFileSystem's output stream buffered.

          Show
          Hairong Kuang added a comment - The patch incoporated Doug's comments. > RawLocalFileSystem#open() and #create() ignore the bufferSize parameter. This means that local file access will end up using only a buffer of bytesPerChecksum, which will hurt performance. This is not a simple fix. I create a class BufferedFSInputStream. This is a very general class, so I make it public although it is used only by RawLocalFileSystem. I adjust the implementation of FSDataInputStream & the Seekable interface so FSDataInputStream works well with a BufferedInputStream. I also make RawLocalFileSystem's output stream buffered.
          Hide
          Doug Cutting added a comment -

          1. Can you please examine the FindBugs warnings?

          2. RawLocalFileSystem#open() and #create() ignore the bufferSize parameter. This means that local file access will end up using only a buffer of bytesPerChecksum, which will hurt performance.

          3. ChecksumFileSystem#readChunk() always seeks the sums stream, even when it's already in the right spot. Should we rely on implementations to optimize this? I've seen seek implementations which are expensive even when the file position is unchanged. RawLocalFileSystem will end up using FileChannel's implementation, for which we have no source, so we'd need to benchmark it to make sure that this is optimized.

          Show
          Doug Cutting added a comment - 1. Can you please examine the FindBugs warnings? 2. RawLocalFileSystem#open() and #create() ignore the bufferSize parameter. This means that local file access will end up using only a buffer of bytesPerChecksum, which will hurt performance. 3. ChecksumFileSystem#readChunk() always seeks the sums stream, even when it's already in the right spot. Should we rely on implementations to optimize this? I've seen seek implementations which are expensive even when the file position is unchanged. RawLocalFileSystem will end up using FileChannel's implementation, for which we have no source, so we'd need to benchmark it to make sure that this is optimized.
          Hide
          Hadoop QA added a comment -
          Show
          Hadoop QA added a comment - +0, new Findbugs warnings http://issues.apache.org/jira/secure/attachment/12360972/GenericChecksum2.patch applied and successfully tested against trunk revision r552548, but there appear to be new Findbugs warnings introduced by this patch. New Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/354/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/354/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/354/console
          Hide
          Hairong Kuang added a comment - - edited

          Thanks Doug. I am marking the patch as available. But please let me know if anybody has any additional comment.

          Show
          Hairong Kuang added a comment - - edited Thanks Doug. I am marking the patch as available. But please let me know if anybody has any additional comment.
          Hide
          Hairong Kuang added a comment -

          The patch incoporated Doug's two comments:

          1. The filePos field is named 'chunkPos'
          2. The number of retries is set to be the file's replication factor.

          Show
          Hairong Kuang added a comment - The patch incoporated Doug's two comments: 1. The filePos field is named 'chunkPos' 2. The number of retries is set to be the file's replication factor.
          Hide
          Doug Cutting added a comment -

          Yes, this sounds like a fine plan.

          Show
          Doug Cutting added a comment - Yes, this sounds like a fine plan.
          Hide
          Hairong Kuang added a comment -

          > for this release, InputChecker reads one chunk at a time and has only a 1-chunk buffer, leaving larger buffers to subclasses. Is that right?

          Yes. Is this OK with you, Doug?

          Show
          Hairong Kuang added a comment - > for this release, InputChecker reads one chunk at a time and has only a 1-chunk buffer, leaving larger buffers to subclasses. Is that right? Yes. Is this OK with you, Doug?
          Hide
          Doug Cutting added a comment -

          > I feel that we should adress the comlexity by breaking it into two classes

          Yes, I agree.

          > address the optimization in the next release

          So that means, for this release, InputChecker reads one chunk at a time and has only a 1-chunk buffer, leaving larger buffers to subclasses. Is that right?

          Show
          Doug Cutting added a comment - > I feel that we should adress the comlexity by breaking it into two classes Yes, I agree. > address the optimization in the next release So that means, for this release, InputChecker reads one chunk at a time and has only a 1-chunk buffer, leaving larger buffers to subclasses. Is that right?
          Hide
          Hairong Kuang added a comment -

          Even with two cascaded buffers, the code seems quite complex. I feel that we should adress the comlexity by breaking it into two classes:
          1. BufferedFSInputStream: Provide a buffered input stream and readChunk method
          2. FSInputChecker: contains a buffered input stream, align operations on chunk boundaries, and verify checksums

          Since this jira is not a regression, I would like to submit a patch without the buffering optimization and then address the optimization in the next release. I'd like to give Raghu sometime to adapt his block-level crc patch to this framework. Does it work?

          Show
          Hairong Kuang added a comment - Even with two cascaded buffers, the code seems quite complex. I feel that we should adress the comlexity by breaking it into two classes: 1. BufferedFSInputStream: Provide a buffered input stream and readChunk method 2. FSInputChecker: contains a buffered input stream, align operations on chunk boundaries, and verify checksums Since this jira is not a regression, I would like to submit a patch without the buffering optimization and then address the optimization in the next release. I'd like to give Raghu sometime to adapt his block-level crc patch to this framework. Does it work?
          Hide
          Hairong Kuang added a comment -

          > It might be easiest to implement this as two buffers: one, multi-chunk and un-checksummed ...
          This proposal sounds good to me.

          But I am not fully convinced if we should support buffering in FSInputChecker or not. It seems to me that Sameer wants readChunk to support both pread & read in FSInputChecker. It's not clear to me if we want to provide a generic pread in FSInputChecker. If yes, I am not sure if having one common API readChunk to implement both pread and read is a good idea. If readChunk is for supporting only read, then we could have readChunk to be optimized for sequential reading. We should not need to worry about mutiple preads causing excessive seeks.

          Show
          Hairong Kuang added a comment - > It might be easiest to implement this as two buffers: one, multi-chunk and un-checksummed ... This proposal sounds good to me. But I am not fully convinced if we should support buffering in FSInputChecker or not. It seems to me that Sameer wants readChunk to support both pread & read in FSInputChecker. It's not clear to me if we want to provide a generic pread in FSInputChecker. If yes, I am not sure if having one common API readChunk to implement both pread and read is a good idea. If readChunk is for supporting only read, then we could have readChunk to be optimized for sequential reading. We should not need to worry about mutiple preads causing excessive seeks.
          Hide
          Raghu Angadi added a comment -

          Before the above proposal, 'large buffer size' was passed on to Implementation. Now, looks like implementation does not (or should not) buffer the data it reads (usually using BufferedInputStream) and should try to read multiple chunks (without extra copy).

          DFS can avoid only one of the two : one socket.read() per chunk or extra copy. It can not avoid both. If InputChecker does not manage the larger buffer also, rather just passes it on to implementation, then it can avoid both. But cost of socket.read() per chunk is probably negligible.

          Show
          Raghu Angadi added a comment - Before the above proposal, 'large buffer size' was passed on to Implementation. Now, looks like implementation does not (or should not) buffer the data it reads (usually using BufferedInputStream) and should try to read multiple chunks (without extra copy). DFS can avoid only one of the two : one socket.read() per chunk or extra copy. It can not avoid both. If InputChecker does not manage the larger buffer also, rather just passes it on to implementation, then it can avoid both. But cost of socket.read() per chunk is probably negligible.
          Hide
          Sameer Paranjpye added a comment -

          > It might be easiest to implement this as two buffers: one, multi-chunk and un-checksummed ...

          +1

          It could well be so. Hairong, what do you think?

          Show
          Sameer Paranjpye added a comment - > It might be easiest to implement this as two buffers: one, multi-chunk and un-checksummed ... +1 It could well be so. Hairong, what do you think?
          Hide
          Raghu Angadi added a comment -

          > InputChecker to address your concerns about excessive seeks when multiple threads use pread on a shared file handle with a small chunk size. I agree that we need a larger buffer somewhere.

          If user is doing large pread()s, then user passes in larger buffer. Is that enough?

          Or are you talking about sharing same buffered data between multiple simultaneous (or closely timed) preads?

          Show
          Raghu Angadi added a comment - > InputChecker to address your concerns about excessive seeks when multiple threads use pread on a shared file handle with a small chunk size. I agree that we need a larger buffer somewhere. If user is doing large pread()s, then user passes in larger buffer. Is that enough? Or are you talking about sharing same buffered data between multiple simultaneous (or closely timed) preads?
          Hide
          Doug Cutting added a comment -

          Sameer, you suggested yesterday that we might use a larger, multi-chunk buffer in the InputChecker to address your concerns about excessive seeks when multiple threads use pread on a shared file handle with a small chunk size. I agree that we need a larger buffer somewhere. Earlier I've suggested that implementations should use a larger buffer internally. But we could, as you suggest, instead make InputChecker's buffer larger, with a policy that data is checksummed as late as possible. Each chunk within the buffer would be checksummed immediately prior to returning any data from it. Implementing that could be a bit hairy and hence bug prone. It might be easiest to implement this as two buffers: one, multi-chunk and un-checksummed, and one checksummed single-chunk, the latter filled from the former. As an optimization, they could even share buffer memory, so that refilling the checksummed buffer would involve just checksumming some data and resetting its start position. But, in the code, this would be abstracted as two cascaded buffers, to address complexity. Does that make sense?

          Show
          Doug Cutting added a comment - Sameer, you suggested yesterday that we might use a larger, multi-chunk buffer in the InputChecker to address your concerns about excessive seeks when multiple threads use pread on a shared file handle with a small chunk size. I agree that we need a larger buffer somewhere. Earlier I've suggested that implementations should use a larger buffer internally. But we could, as you suggest, instead make InputChecker's buffer larger, with a policy that data is checksummed as late as possible. Each chunk within the buffer would be checksummed immediately prior to returning any data from it. Implementing that could be a bit hairy and hence bug prone. It might be easiest to implement this as two buffers: one, multi-chunk and un-checksummed, and one checksummed single-chunk, the latter filled from the former. As an optimization, they could even share buffer memory, so that refilling the checksummed buffer would involve just checksumming some data and resetting its start position. But, in the code, this would be abstracted as two cascaded buffers, to address complexity. Does that make sense?
          Hide
          Doug Cutting added a comment -

          This is looking good to me. A few quick notes:

          1. The filePos field might better be named 'chunkPos', since it should only ever hold the position of a chunk boundary.

          2. Should the number of retries be based on FileSystem#getReplication()? We have a generic way to talk about replication, so perhaps we should use it here?

          Show
          Doug Cutting added a comment - This is looking good to me. A few quick notes: 1. The filePos field might better be named 'chunkPos', since it should only ever hold the position of a chunk boundary. 2. Should the number of retries be based on FileSystem#getReplication()? We have a generic way to talk about replication, so perhaps we should use it here?
          Hide
          Hairong Kuang added a comment -

          A new patch for review.

          Look like we do not need the method getChunkSize. Instead FSInputCheck provides a method "set" for setting checksum type, maxChunkSize, and checksum size. Hope that it is sufficient for the block-level crc dfs.

          Show
          Hairong Kuang added a comment - A new patch for review. Look like we do not need the method getChunkSize. Instead FSInputCheck provides a method "set" for setting checksum type, maxChunkSize, and checksum size. Hope that it is sufficient for the block-level crc dfs.
          Hide
          Hairong Kuang added a comment -

          > when an application does a large read, the current API requires that the data be fetched piecemeal into the InputChecker buffer, checksummed and then copied to the application buffer

          When an application does a large read, the current implemenation copies data piecemeal into user buffer directly and then checksummed, therefore skipping a copy to the InputChecker buffer.

          Show
          Hairong Kuang added a comment - > when an application does a large read, the current API requires that the data be fetched piecemeal into the InputChecker buffer, checksummed and then copied to the application buffer When an application does a large read, the current implemenation copies data piecemeal into user buffer directly and then checksummed, therefore skipping a copy to the InputChecker buffer.
          Hide
          Sameer Paranjpye added a comment -

          > but we also need to minimize the amount of buffered, checksummed data. It's thus preferable to keep the outermost, checksummed buffer small ...

          Granted, but changing the readChunk() contract to allow multiple continguous chunks to be read at once doesn't impact buffering. The outermost buffer would still be small. However, when an application does a large read, the current API requires that the data be fetched piecemeal into the InputChecker buffer, checksummed and then copied to the application buffer. Allowing multiple contiguous chunks to be read simultaneously, would allow the application buffer to be passed directly to readChunk() and checksummed in place and thus save a copy.

          This is a small performance gain, however, a more important reason is that constraining readChunk() to 1 chunk at a time makes it difficult to support preads.

          Implementing efficient support for 'sequential preads' i.e. multiple pread invocations hitting contiguous byte ranges is quite tricky. An implementation has to be pretty clever in order to efficiently serve several concurrent threads all calling pread. It probably needs to keep a pool of file descriptors around and have some complicated logic to determine, among other things:

          • if, in a pread invocation, the position on any of its available descriptors is close to the position in the call, so that buffered data can be used
          • if the position in a call isn't a good match for any of the descriptors, then whether to seek on an existing descriptor or open a new one. Using an existing descriptor might cause buffered data useful for a subsequent call to be thrown away

          On the other hand, an implementation thats not trying support 'sequential preads', can be a lot simpler. It also needs to keep a pool of descriptors. But in each call it can pick an arbitrary descriptor, seek, fetch and return the data. It would also do no buffering (except on 1 canonical descriptor which corresponds to the file position that the implementation reports to clients).

          It's not clear that the 'sequential pread' access pattern occurs frequently in application contexts. It certainly doesn't in the use cases we've seen so far, they almost all have very little locality of reference across pread invocations. However, given the readChunk() API above, an application level pread would most likely be split into multiple readChunk() calls and require the 'sequential pread' access pattern to be supported. It feels like this access pattern is being imposed on implementations by the readChunk() API and not by application contexts and that is undesirable. If the readChunk() API allowed multiple chunks to be read at once, then an application level pread would likely result in a single call to readChunk() and would allow for a simple implementation.

          Show
          Sameer Paranjpye added a comment - > but we also need to minimize the amount of buffered, checksummed data. It's thus preferable to keep the outermost, checksummed buffer small ... Granted, but changing the readChunk() contract to allow multiple continguous chunks to be read at once doesn't impact buffering. The outermost buffer would still be small. However, when an application does a large read, the current API requires that the data be fetched piecemeal into the InputChecker buffer, checksummed and then copied to the application buffer. Allowing multiple contiguous chunks to be read simultaneously, would allow the application buffer to be passed directly to readChunk() and checksummed in place and thus save a copy. This is a small performance gain, however, a more important reason is that constraining readChunk() to 1 chunk at a time makes it difficult to support preads. Implementing efficient support for 'sequential preads' i.e. multiple pread invocations hitting contiguous byte ranges is quite tricky. An implementation has to be pretty clever in order to efficiently serve several concurrent threads all calling pread. It probably needs to keep a pool of file descriptors around and have some complicated logic to determine, among other things: if, in a pread invocation, the position on any of its available descriptors is close to the position in the call, so that buffered data can be used if the position in a call isn't a good match for any of the descriptors, then whether to seek on an existing descriptor or open a new one. Using an existing descriptor might cause buffered data useful for a subsequent call to be thrown away On the other hand, an implementation thats not trying support 'sequential preads', can be a lot simpler. It also needs to keep a pool of descriptors. But in each call it can pick an arbitrary descriptor, seek, fetch and return the data. It would also do no buffering (except on 1 canonical descriptor which corresponds to the file position that the implementation reports to clients). It's not clear that the 'sequential pread' access pattern occurs frequently in application contexts. It certainly doesn't in the use cases we've seen so far, they almost all have very little locality of reference across pread invocations. However, given the readChunk() API above, an application level pread would most likely be split into multiple readChunk() calls and require the 'sequential pread' access pattern to be supported. It feels like this access pattern is being imposed on implementations by the readChunk() API and not by application contexts and that is undesirable. If the readChunk() API allowed multiple chunks to be read at once, then an application level pread would likely result in a single call to readChunk() and would allow for a simple implementation.
          Hide
          Doug Cutting added a comment -

          The InputChecker class constructor shouldn't take a bufferSize parameter. Instead it should use bytesPerChecksum to size its buffer. Subclass constructors should take a bufferSize parameter and use this to buffer readChunk() data.

          Show
          Doug Cutting added a comment - The InputChecker class constructor shouldn't take a bufferSize parameter. Instead it should use bytesPerChecksum to size its buffer. Subclass constructors should take a bufferSize parameter and use this to buffer readChunk() data.
          Hide
          Doug Cutting added a comment -

          > Is the intent to have readChunk() fetch a single chunk at a time or should it be capable of reading many chunks?

          I'd assumed it would only read a single chunk and checksum at a time. We could change that contract, and it might improve performance a bit, but we also need to minimize the amount of buffered, checksummed data. It's thus preferable to keep the outermost, checksummed buffer small (~bytesPerChecksum) and put larger buffers in implementations. If an implementation has a 64k buffer, and bytesPerChecksum is 1k, then most calls to readChunk() won't do any i/o, but will just copy data from one buffer to another, verifying checksums as late as possible.

          Show
          Doug Cutting added a comment - > Is the intent to have readChunk() fetch a single chunk at a time or should it be capable of reading many chunks? I'd assumed it would only read a single chunk and checksum at a time. We could change that contract, and it might improve performance a bit, but we also need to minimize the amount of buffered, checksummed data. It's thus preferable to keep the outermost, checksummed buffer small (~bytesPerChecksum) and put larger buffers in implementations. If an implementation has a 64k buffer, and bytesPerChecksum is 1k, then most calls to readChunk() won't do any i/o, but will just copy data from one buffer to another, verifying checksums as late as possible.
          Hide
          Sameer Paranjpye added a comment -

          > /** Read a full chunk and its checksum at the specified position, returning its length. */
          > int readChunk(long position, byte[] buffer, int start, int length, byte[] checksum);

          Is the intent to have readChunk() fetch a single chunk at a time or should it be capable of reading many chunks?
          I think it should permit the reading of multiple contiguous chunks and their corresponding checksums. Most reads or preads will be for more than a single bytesPerChecksum sized chunk and allowing the retrieval of multiple contiguous chunks with a single call will significantly simplify InputChecker implementations and have performance benefits.

          Show
          Sameer Paranjpye added a comment - > /** Read a full chunk and its checksum at the specified position, returning its length. */ > int readChunk(long position, byte[] buffer, int start, int length, byte[] checksum); Is the intent to have readChunk() fetch a single chunk at a time or should it be capable of reading many chunks? I think it should permit the reading of multiple contiguous chunks and their corresponding checksums. Most reads or preads will be for more than a single bytesPerChecksum sized chunk and allowing the retrieval of multiple contiguous chunks with a single call will significantly simplify InputChecker implementations and have performance benefits.
          Hide
          Doug Cutting added a comment -

          > the contract of readChunk should make it clear if the method changes the file descriptor state or not

          Since a position is always passed, the contract has no reliance on file descriptor state. A stateless implementation could be implemented in terms of preads of underlying data. If an implementation does have file descriptor state that readChunk() changes, then the implementation must manage that state. It should be easy to implement a generic pread on top of such an api, since pread is the default.

          Show
          Doug Cutting added a comment - > the contract of readChunk should make it clear if the method changes the file descriptor state or not Since a position is always passed, the contract has no reliance on file descriptor state. A stateless implementation could be implemented in terms of preads of underlying data. If an implementation does have file descriptor state that readChunk() changes, then the implementation must manage that state. It should be easy to implement a generic pread on top of such an api, since pread is the default.
          Hide
          Hairong Kuang added a comment -

          > The API does not care, since it always passes in the absolute position: there is no file position. An implementation should optimize for sequential reads, and perhaps also parallel reads, but that's a secondary priority. We may not even have to explicitly optimize for sequential reads in ChecksumFileSystem, since most implementations of seek already implement that optimization.

          I think the contract of readChunk should make it clear if the method changes the file descriptor state or not because it makes a difference if the implementation of pread could use it or not.

          Show
          Hairong Kuang added a comment - > The API does not care, since it always passes in the absolute position: there is no file position. An implementation should optimize for sequential reads, and perhaps also parallel reads, but that's a secondary priority. We may not even have to explicitly optimize for sequential reads in ChecksumFileSystem, since most implementations of seek already implement that optimization. I think the contract of readChunk should make it clear if the method changes the file descriptor state or not because it makes a difference if the implementation of pread could use it or not.
          Hide
          Doug Cutting added a comment -

          > is it assumed that two readChunk()s on a stream should happen in parallel?

          That'd be up to the implementation. An optimized implementation might cache a file handle and use it for sequential calls, but if a parallel call comes in, it could create a new file handle. It could potentially even keep a pool of file handles. Or it could just keep a single file handle and synchronize, seeking whenever the position is not the current position. That's probably the implementation we should start with, and optimize it later as needed. But I think such optimizations would be FileSystem-specific, not generic.

          > Should readChunk change the position of the file descriptor?

          The API does not care, since it always passes in the absolute position: there is no file position. An implementation should optimize for sequential reads, and perhaps also parallel reads, but that's a secondary priority. We may not even have to explicitly optimize for sequential reads in ChecksumFileSystem, since most implementations of seek already implement that optimization.

          > getChecksumSize(): We probabaly do not need this abstract method [...]

          Okay.

          Show
          Doug Cutting added a comment - > is it assumed that two readChunk()s on a stream should happen in parallel? That'd be up to the implementation. An optimized implementation might cache a file handle and use it for sequential calls, but if a parallel call comes in, it could create a new file handle. It could potentially even keep a pool of file handles. Or it could just keep a single file handle and synchronize, seeking whenever the position is not the current position. That's probably the implementation we should start with, and optimize it later as needed. But I think such optimizations would be FileSystem-specific, not generic. > Should readChunk change the position of the file descriptor? The API does not care, since it always passes in the absolute position: there is no file position. An implementation should optimize for sequential reads, and perhaps also parallel reads, but that's a secondary priority. We may not even have to explicitly optimize for sequential reads in ChecksumFileSystem, since most implementations of seek already implement that optimization. > getChecksumSize(): We probabaly do not need this abstract method [...] Okay.
          Hide
          Hairong Kuang added a comment -

          > int readChunk(long position, byte[] buffer, int start, int length, byte[] checksum);
          Should readChunk change the position of the file descriptor? I think it should. Even if it does not, implementing pread by calling multiple readChunk is not efficient.

          > int getChecksumSize(); /** Return the length of each checksum. */
          We probabaly do not need this abstract method because the length of each checksum is related to the type of Checksum and the checksum type is visible to the generic checker.

          Show
          Hairong Kuang added a comment - > int readChunk(long position, byte[] buffer, int start, int length, byte[] checksum); Should readChunk change the position of the file descriptor? I think it should. Even if it does not, implementing pread by calling multiple readChunk is not efficient. > int getChecksumSize(); /** Return the length of each checksum. */ We probabaly do not need this abstract method because the length of each checksum is related to the type of Checksum and the checksum type is visible to the generic checker.
          Hide
          Raghu Angadi added a comment -

          Doug, since this expected to be used by pread(), is it assumed that two readChunk()s on a stream should happen in parallel?

          Show
          Raghu Angadi added a comment - Doug, since this expected to be used by pread(), is it assumed that two readChunk()s on a stream should happen in parallel?
          Hide
          Doug Cutting added a comment -

          It sounds like we're reaching convergence. The InputChecker abstract methods might be something like:

          /** Return position of beginning of chunk containing pos. */
          long getChunkPosition(long pos);
          /** Return the maximum length of chunks in this file. */
          int getChunkSize();
          /** Return the length of each checksum. */
          int getChecksumSize();
          /** Read a full chunk and its checksum at the specified position, returning its length. */
          int readChunk(long position, byte[] buffer, int start, int length, byte[] checksum);
          

          Is that sufficient that an input stream can be fully implemented w/o overriding other methods?

          Note that I've added the checksum into readChunk() to simplify synchronization and pread. Does that look reasonable?

          Show
          Doug Cutting added a comment - It sounds like we're reaching convergence. The InputChecker abstract methods might be something like: /** Return position of beginning of chunk containing pos. */ long getChunkPosition( long pos); /** Return the maximum length of chunks in this file. */ int getChunkSize(); /** Return the length of each checksum. */ int getChecksumSize(); /** Read a full chunk and its checksum at the specified position, returning its length. */ int readChunk( long position, byte [] buffer, int start, int length, byte [] checksum); Is that sufficient that an input stream can be fully implemented w/o overriding other methods? Note that I've added the checksum into readChunk() to simplify synchronization and pread. Does that look reasonable?
          Hide
          Raghu Angadi added a comment -

          > Java.io.InputStream has a default implementation of skip using read. So GenericChecker does not need to implement the same logic again.
          This is unsuitable for InputChecker because InputStream.skip() uses a global buffer, which will result in checksum failures if the global buffer size is larger than bytesPerChecksum. Plus, ideally skip() should never throw a checksum exception.

          Show
          Raghu Angadi added a comment - > Java.io.InputStream has a default implementation of skip using read. So GenericChecker does not need to implement the same logic again. This is unsuitable for InputChecker because InputStream.skip() uses a global buffer, which will result in checksum failures if the global buffer size is larger than bytesPerChecksum. Plus, ideally skip() should never throw a checksum exception.
          Hide
          Raghu Angadi added a comment - - edited

          I didn't see Hairong's comment before posting the above comment. I am not opposed to generic seek. Please change the interface as required. Hopefully DFS (or any other FS) does not need more code to support it.

          Show
          Raghu Angadi added a comment - - edited I didn't see Hairong's comment before posting the above comment. I am not opposed to generic seek. Please change the interface as required. Hopefully DFS (or any other FS) does not need more code to support it.
          Hide
          Raghu Angadi added a comment -

          Even without generic skip(), attached patch should use seek() inside skip() (or vice versa) since both do the same currently.

          Doug, I thought about position for readChunk(), could you comment on the following or show a simple use case?:

          1. InputChecker needs to be file position aware. Right now it is not. It is more of an utility to handle checksums.
          2. What is contract of pos? I mean what is an implementation supposed to assume if say pos in readChunk_n+1 is not equal to 'pos in readChunk_n + nextChunkSize_n'
          3. Seek to random position implies that InputChecker can not know where a checksum chunks starts, so we need another abstract method for that. this method is not simple in general.
          Show
          Raghu Angadi added a comment - Even without generic skip(), attached patch should use seek() inside skip() (or vice versa) since both do the same currently. Doug, I thought about position for readChunk(), could you comment on the following or show a simple use case?: InputChecker needs to be file position aware. Right now it is not. It is more of an utility to handle checksums. What is contract of pos? I mean what is an implementation supposed to assume if say pos in readChunk_n+1 is not equal to 'pos in readChunk_n + nextChunkSize_n' Seek to random position implies that InputChecker can not know where a checksum chunks starts, so we need another abstract method for that. this method is not simple in general.
          Hide
          Hairong Kuang added a comment -

          > I think skip() can still be implemented by genericChecker().
          Java.io.InputStream has a default implementation of skip using read. So GenericChecker does not need to implement the same logic again.

          > What do folks think of my idea that readChunk() might take an absolute file position?
          I like the idea. But in addition to this, we also need a function that given a position returns the beggining of the checksum chunk that contains this position. The absolute position that is passed to readChunk needs to be a beginning of a checksum chunk. NextChunkSize also needs to take an absolute position.

          > ChecksumFSInputChecker#read(long,byte[],int,int) looks like it moves the file pointer. Or am I misreading?
          Pread reconstuctes a new file descriptor. It's not an ideal implementation, but it is the best that I can think of.

          Show
          Hairong Kuang added a comment - > I think skip() can still be implemented by genericChecker(). Java.io.InputStream has a default implementation of skip using read. So GenericChecker does not need to implement the same logic again. > What do folks think of my idea that readChunk() might take an absolute file position? I like the idea. But in addition to this, we also need a function that given a position returns the beggining of the checksum chunk that contains this position. The absolute position that is passed to readChunk needs to be a beginning of a checksum chunk. NextChunkSize also needs to take an absolute position. > ChecksumFSInputChecker#read(long,byte[],int,int) looks like it moves the file pointer. Or am I misreading? Pread reconstuctes a new file descriptor. It's not an ideal implementation, but it is the best that I can think of.
          Hide
          Doug Cutting added a comment -

          > I think skip() can still be implemented by genericChecker().

          Great! What do folks think of my idea that readChunk() might take an absolute file position? That should make it easy for seek() to also be generic, since it would either (a) directly handle a seek within the buffer; or (b) invalidate the buffer so that the next call to read() immediately causes a readChunk() at the new file position.

          Show
          Doug Cutting added a comment - > I think skip() can still be implemented by genericChecker(). Great! What do folks think of my idea that readChunk() might take an absolute file position? That should make it easy for seek() to also be generic, since it would either (a) directly handle a seek within the buffer; or (b) invalidate the buffer so that the next call to read() immediately causes a readChunk() at the new file position.
          Hide
          Raghu Angadi added a comment -

          I think skip() can still be implemented by genericChecker(). It can either use read() or small variation of read(). An implementation can choose to use generic skip() selectively. For. e.g. in the case of DFS it will use generic skip if skip is small (< 128KB) and falls with in the same block, other wise does 'seek(curPos() + skipLen)'.

          Show
          Raghu Angadi added a comment - I think skip() can still be implemented by genericChecker(). It can either use read() or small variation of read(). An implementation can choose to use generic skip() selectively. For. e.g. in the case of DFS it will use generic skip if skip is small (< 128KB) and falls with in the same block, other wise does 'seek(curPos() + skipLen)'.
          Hide
          Doug Cutting added a comment -

          I'm trying to understand the FSInputChecker and FSOutputChecker APIs, and finding it a bit tricky.

          1. FSInputChecker and FSOutputChecker should have constructors that initialize fields if possible, rather than relying on subclass constructors to do this. This would make it clearer what subclasses are responsible for and what FSInputChecker does.

          2. All the fields of FSInputChecker are 'protected' rather than private, yet I don't think all of them are actually used by subclasses. It'd make the contract of this class clearer if fewer fields were private. Also, protected fields in a public class should have javadoc describing how they're used. Could 'count-pos' be provided via a method?

          3. ChecksumFSInputChecker#read(long,byte[],int,int) looks like it moves the file pointer. Or am I misreading?

          4. Lots of the stuff handled in ChecksumFSInputChecker (like optimizing for stuff that's already in the buffer) seems like it should be handled by the generic code.

          Overall, the abstraction boundary seems a bit fuzzy. Ideally implementations should only have to implement readChunk and readChecksum, but in fact they must implement much more (seek & skip, which are non-trivial). What if the abstraction were changed so that pread semantics were in effect for readChunk: an absolute file position is always passed? Then the implementation classes could check their current position and seek as needed. The generic code could, to make things efficient, arrange to always read chunks sequentially, so that most of the time the implementation wouldn't have to actually seek. Perhaps then all of seek() and skip() could be handled in generic code?

          Show
          Doug Cutting added a comment - I'm trying to understand the FSInputChecker and FSOutputChecker APIs, and finding it a bit tricky. 1. FSInputChecker and FSOutputChecker should have constructors that initialize fields if possible, rather than relying on subclass constructors to do this. This would make it clearer what subclasses are responsible for and what FSInputChecker does. 2. All the fields of FSInputChecker are 'protected' rather than private, yet I don't think all of them are actually used by subclasses. It'd make the contract of this class clearer if fewer fields were private. Also, protected fields in a public class should have javadoc describing how they're used. Could 'count-pos' be provided via a method? 3. ChecksumFSInputChecker#read(long,byte[],int,int) looks like it moves the file pointer. Or am I misreading? 4. Lots of the stuff handled in ChecksumFSInputChecker (like optimizing for stuff that's already in the buffer) seems like it should be handled by the generic code. Overall, the abstraction boundary seems a bit fuzzy. Ideally implementations should only have to implement readChunk and readChecksum, but in fact they must implement much more (seek & skip, which are non-trivial). What if the abstraction were changed so that pread semantics were in effect for readChunk: an absolute file position is always passed? Then the implementation classes could check their current position and seek as needed. The generic code could, to make things efficient, arrange to always read chunks sequentially, so that most of the time the implementation wouldn't have to actually seek. Perhaps then all of seek() and skip() could be handled in generic code?
          Hide
          Hairong Kuang added a comment -

          Sorry for poluting this mailing list. My window's machine is having a problem mapping a network drive. It does not refressh my file changes at the remote machine. So the old file got loaded when I attached file from my window's machine. Hope it works this time.

          Show
          Hairong Kuang added a comment - Sorry for poluting this mailing list. My window's machine is having a problem mapping a network drive. It does not refressh my file changes at the remote machine. So the old file got loaded when I attached file from my window's machine. Hope it works this time.
          Hide
          Hairong Kuang added a comment -

          This patch includes the following:
          1. Define a generic FSInputChecker and FSOutputSummer;
          2. Implement ChecksumFSInputChecker & ChecksumFSOutputSummer for ChecksumFileSystem
          3. Two junit tests for testing FSInputChecker & FSOutputSummer;
          4. Remove Buffer & PositionCache from FSDataInputStream; Pass user-defined buffer size to each file's FSInputStream;
          5. Remove Buffer from FSDataOutputStream (but keep the PositionCache); Pass user-defiened buffer size to each file 's output stream.

          Show
          Hairong Kuang added a comment - This patch includes the following: 1. Define a generic FSInputChecker and FSOutputSummer; 2. Implement ChecksumFSInputChecker & ChecksumFSOutputSummer for ChecksumFileSystem 3. Two junit tests for testing FSInputChecker & FSOutputSummer; 4. Remove Buffer & PositionCache from FSDataInputStream; Pass user-defined buffer size to each file's FSInputStream; 5. Remove Buffer from FSDataOutputStream (but keep the PositionCache); Pass user-defiened buffer size to each file 's output stream.
          Hide
          Raghu Angadi added a comment -

          Chatted with Sameer regd preads in DFS, and this is what he has in mind :

          DFSInputStream() opens on InputChecker for each block it is reading from. So normally there would be one 'blockStream'. In case of pread, instead of creating a new DFSInputStream as I was thinking, it would create one InputChecker with each block involved (in sequence, it could even fetch data parallely in future). So pread() will not create a new DFSInputStream(). This will use all the Block cache etc. This looks good to me. The current Block Level CRCs has similar structure (it creates one BlockStream object for each block etc).

          Also above pread implementation is idependent of actual implementation of InputChecker. ie it is independent of whether InputChecker uses 'readChunk()', 'readChecksum()' or 'two streams'.

          Show
          Raghu Angadi added a comment - Chatted with Sameer regd preads in DFS, and this is what he has in mind : DFSInputStream() opens on InputChecker for each block it is reading from. So normally there would be one 'blockStream'. In case of pread, instead of creating a new DFSInputStream as I was thinking, it would create one InputChecker with each block involved (in sequence, it could even fetch data parallely in future). So pread() will not create a new DFSInputStream(). This will use all the Block cache etc. This looks good to me. The current Block Level CRCs has similar structure (it creates one BlockStream object for each block etc). Also above pread implementation is idependent of actual implementation of InputChecker. ie it is independent of whether InputChecker uses 'readChunk()', 'readChecksum()' or 'two streams'.
          Hide
          Raghu Angadi added a comment -

          > How will 'pread' be supported?
          What I thought when I proposed the interface regd pread: Basically it will be supported pretty much the same way as current DFS. i.e. each pread is pretty much a new independent stream without the overhead of open(). That is what current implementation does (of course it is simpler now since there are no checksums at DFS level currently). So pread() uses regular read() and not vice-versa (so pread pretty much uses all the code written for read()). But that proposal could be unsuitable and we don't need to use it.

          Currently ChecksumFS's pread() does not check checksums. I was not entirely clear about how to implement, but it was going to some wrapper over regular read (without blocking normal reads), mainly to reuse all the code for InputChecker.

          Show
          Raghu Angadi added a comment - > How will 'pread' be supported? What I thought when I proposed the interface regd pread: Basically it will be supported pretty much the same way as current DFS. i.e. each pread is pretty much a new independent stream without the overhead of open(). That is what current implementation does (of course it is simpler now since there are no checksums at DFS level currently). So pread() uses regular read() and not vice-versa (so pread pretty much uses all the code written for read()). But that proposal could be unsuitable and we don't need to use it. Currently ChecksumFS's pread() does not check checksums. I was not entirely clear about how to implement, but it was going to some wrapper over regular read (without blocking normal reads), mainly to reuse all the code for InputChecker.
          Hide
          Sameer Paranjpye added a comment -

          Maybe I'm missing something, but this approach bothers me.

          The early discussion on this issue suggests that the goal was to generate a simple implementation template with a couple of degrees of freedom (bytes per checksum, checksum type) that could be used in the ChecksumFileSystem and HDFS. This can be achieved by pretty much using the FSInputChecker as is with a couple of tweaks.

          Instead, we've ended up with an additional abstract class/interface ChecksumInputChecker with readChunk, readChecksum etc. methods. This class seems unnecessary. A filtering stream which contains and interprets an underlying stream(s) is the design pattern used almost everywhere (compression implementations in particular), and appears to be sufficient for this scenario.

          The comments on the 'readChunk' etc. methods seem to imply single threaded access, for instance:

          /* returns Checksum of the last chunk read.
          If we want to support something like MD5, we could change this
          to fill a buffer instead of returning a long.
          This class gurantees that readChecksum() is invoked only once
          and only after readChunk() was successful */

          How will 'pread' be supported? We could have lots of threads making concurrent calls to 'pread'. We could introduce explicit offsets into the readChunk etc. method signatures or do some per thread bookkeeping in the implementation. Again 'pread' is present in the existing FSInputStream interface, adding additional methods with the same functionality doesn't feel like it adds value.

          Show
          Sameer Paranjpye added a comment - Maybe I'm missing something, but this approach bothers me. The early discussion on this issue suggests that the goal was to generate a simple implementation template with a couple of degrees of freedom (bytes per checksum, checksum type) that could be used in the ChecksumFileSystem and HDFS. This can be achieved by pretty much using the FSInputChecker as is with a couple of tweaks. Instead, we've ended up with an additional abstract class/interface ChecksumInputChecker with readChunk, readChecksum etc. methods. This class seems unnecessary. A filtering stream which contains and interprets an underlying stream(s) is the design pattern used almost everywhere (compression implementations in particular), and appears to be sufficient for this scenario. The comments on the 'readChunk' etc. methods seem to imply single threaded access, for instance: /* returns Checksum of the last chunk read. If we want to support something like MD5, we could change this to fill a buffer instead of returning a long. This class gurantees that readChecksum() is invoked only once and only after readChunk() was successful */ How will 'pread' be supported? We could have lots of threads making concurrent calls to 'pread'. We could introduce explicit offsets into the readChunk etc. method signatures or do some per thread bookkeeping in the implementation. Again 'pread' is present in the existing FSInputStream interface, adding additional methods with the same functionality doesn't feel like it adds value.
          Hide
          Raghu Angadi added a comment -

          To clarify, I am for having a generic seek, if that makes sense.

          Show
          Raghu Angadi added a comment - To clarify, I am for having a generic seek, if that makes sense.
          Hide
          Raghu Angadi added a comment -

          > abstract protected long getChunkBoundary(long pos);
          I think this can not easily be supported for an arbitrary value of pos. In the case of DFS, implementation needs to start reading a block at 'pos' to know what the boundary is. It will be much harder for some thing that has variable size "records". I am not sure if seek() is a functionality of a generic "checker". Of course if it is straight fwd to provide a generic seek() we should.

          Show
          Raghu Angadi added a comment - > abstract protected long getChunkBoundary(long pos); I think this can not easily be supported for an arbitrary value of pos. In the case of DFS, implementation needs to start reading a block at 'pos' to know what the boundary is. It will be much harder for some thing that has variable size "records". I am not sure if seek() is a functionality of a generic "checker". Of course if it is straight fwd to provide a generic seek() we should.
          Hide
          Hairong Kuang added a comment -

          Can not stop thinking because I feel the same as Doug that seek could be generialized.

          It looks like the current proposed FSInputChecker could be broken into two classes:

          1. ChecksumFSInputStream is an abstract of datas and checksums or an interleaved stream as in DFS. ChecksumFSInputStream extends FSInputStream and has checksum-related abstract methods: readChunk, readChecksum, getChunkBoundary, nextChunkSize, seekToNewSource, and reportChecksumError etc.

          2. The generic checker FSInputChecker is a concrete class. It contains a ChecksumFSInputStream, extends FSInputStream, and provides generic implementation of methods: read, seek, skip etc.

          Every checksum file system should implement ChecksumFSInputStream, while every non checksum file system implements FSInputStream. So FSDataInputStreams contains a FSInputChecker for checksum file systems or a FSInputStream for non checksumed file system.

          I am still not clear how this approach supports change of Checksum between blocks. Any thoughts?

          Show
          Hairong Kuang added a comment - Can not stop thinking because I feel the same as Doug that seek could be generialized. It looks like the current proposed FSInputChecker could be broken into two classes: 1. ChecksumFSInputStream is an abstract of datas and checksums or an interleaved stream as in DFS. ChecksumFSInputStream extends FSInputStream and has checksum-related abstract methods: readChunk, readChecksum, getChunkBoundary, nextChunkSize, seekToNewSource, and reportChecksumError etc. 2. The generic checker FSInputChecker is a concrete class. It contains a ChecksumFSInputStream, extends FSInputStream, and provides generic implementation of methods: read, seek, skip etc. Every checksum file system should implement ChecksumFSInputStream, while every non checksum file system implements FSInputStream. So FSDataInputStreams contains a FSInputChecker for checksum file systems or a FSInputStream for non checksumed file system. I am still not clear how this approach supports change of Checksum between blocks. Any thoughts?
          Hide
          Hairong Kuang added a comment -

          On a second thought, even with getChunkBoundary, the generic Checker can not provide a general implementation of seek or skip because the underlying datas & checksums are invisible to the generic Checker.

          Show
          Hairong Kuang added a comment - On a second thought, even with getChunkBoundary, the generic Checker can not provide a general implementation of seek or skip because the underlying datas & checksums are invisible to the generic Checker.
          Hide
          Hairong Kuang added a comment -

          Shall we have one more method in the genric Checker?
          abstract protected long getChunkBoundary(long pos);
          It returns the biggest chunk boundary position that is less than or equal to pos.

          There are a number of advantages of having this:
          1. The generic Checker can provide a general implementation of seek, skip.
          2. No need to support readAvailable(); better information hiding.
          2. The current pos can be kept track of in the generic Checker. Therefore, it can provide a general implementation of getPos().
          3. Now we can support positional getChunk(); better support for pread.

          Show
          Hairong Kuang added a comment - Shall we have one more method in the genric Checker? abstract protected long getChunkBoundary(long pos); It returns the biggest chunk boundary position that is less than or equal to pos. There are a number of advantages of having this: 1. The generic Checker can provide a general implementation of seek, skip. 2. No need to support readAvailable(); better information hiding. 2. The current pos can be kept track of in the generic Checker. Therefore, it can provide a general implementation of getPos(). 3. Now we can support positional getChunk(); better support for pread.
          Hide
          Raghu Angadi added a comment -

          > But I am not clear if DFS needs it.
          Yes. I think DFS needs it.

          Show
          Raghu Angadi added a comment - > But I am not clear if DFS needs it. Yes. I think DFS needs it.
          Hide
          Hairong Kuang added a comment -

          > readAvailable() gives the valid data left in InputChecker's buffer. So that getPos() will be {{ datas.getPos() - readAvailable() }}.

          If this is the only usage, I am not sure if readAvailable() is really needed. Caching the current pos is an alternative way of implementing getPos(), which could be done in the generic InputChecker.

          > Yes. ChecksumFS needs this to retain current functionality of "stopSumming()" (with InputChecker it would do 'setChecksum(null)').
          stopSumming is internal to InputChecker so ChecksumFS does not need method setChecksum. But I am not clear if DFS needs it.

          > right. not strictly needed because InputChecker should gaurantee buf has enough space for next chunk. It is the common way to give 'len' everywhere else where we read.
          Having paremter "len" requires the implementation of readChunk to check if it is valid. It has no other use.

          Show
          Hairong Kuang added a comment - > readAvailable() gives the valid data left in InputChecker's buffer. So that getPos() will be {{ datas.getPos() - readAvailable() }}. If this is the only usage, I am not sure if readAvailable() is really needed. Caching the current pos is an alternative way of implementing getPos(), which could be done in the generic InputChecker. > Yes. ChecksumFS needs this to retain current functionality of "stopSumming()" (with InputChecker it would do 'setChecksum(null)'). stopSumming is internal to InputChecker so ChecksumFS does not need method setChecksum. But I am not clear if DFS needs it. > right. not strictly needed because InputChecker should gaurantee buf has enough space for next chunk. It is the common way to give 'len' everywhere else where we read. Having paremter "len" requires the implementation of readChunk to check if it is valid. It has no other use.
          Hide
          Raghu Angadi added a comment -

          I think many of our streams including the ones discussed here should return false for markSupported() (HADOOP-1489).

          Show
          Raghu Angadi added a comment - I think many of our streams including the ones discussed here should return false for markSupported() ( HADOOP-1489 ).
          Hide
          Raghu Angadi added a comment -

          > No, ChecksumFS supports seeking to an arbitrary position.
          You are right. I meant to say it does not need an explicit setSkipBytes() call. DFS will do the same thing as ChecksumFS.

          readAvailable() gives the valid data left in InputChecker's buffer. So that getPos() will be {{ datas.getPos() - readAvailable() }}.

          > Do we allow to change Checksum at any time? Otherwise, I do not think that this is neededed. Instead, the generic class should have a constructor with Checksum as a parameter.

          Yes. ChecksumFS needs this to retain current functionality of "stopSumming()" (with InputChecker it would do 'setChecksum(null)'). DFS could change this across the blocks.

          > Is parameter "len" really needed?
          right. not strictly needed because InputChecker should gaurantee buf has enough space for next chunk. It is the common way to give 'len' everywhere else where we read.

          Show
          Raghu Angadi added a comment - > No, ChecksumFS supports seeking to an arbitrary position. You are right. I meant to say it does not need an explicit setSkipBytes() call. DFS will do the same thing as ChecksumFS. readAvailable() gives the valid data left in InputChecker's buffer. So that getPos() will be {{ datas.getPos() - readAvailable() }}. > Do we allow to change Checksum at any time? Otherwise, I do not think that this is neededed. Instead, the generic class should have a constructor with Checksum as a parameter. Yes. ChecksumFS needs this to retain current functionality of "stopSumming()" (with InputChecker it would do 'setChecksum(null)'). DFS could change this across the blocks. > Is parameter "len" really needed? right. not strictly needed because InputChecker should gaurantee buf has enough space for next chunk. It is the common way to give 'len' everywhere else where we read.
          Hide
          Hairong Kuang added a comment -

          > abstract protected int readChunk(byte[] buf, int offset, int len) throws IOException;
          Is parameter "len" really needed?

          Show
          Hairong Kuang added a comment - > abstract protected int readChunk(byte[] buf, int offset, int len) throws IOException; Is parameter "len" really needed?
          Hide
          Hairong Kuang added a comment -

          > Yeah, it is not used in ChecksumFS because it always seeks to checksum boundary.
          No, ChecksumFS supports seeking to an arbitrary position. It is done by first seeking to the closest checksum boundary and then seeking to the final position by reading (final postion - boundary position) number of bytes.

          Other comments:
          > final int readAvailable();
          Not clear why we need this.

          > final setChecksum(Checksum sum);
          Do we allow to change Checksum at any time? Otherwise, I do not think that this is neededed. Instead, the generic class should have a constructor with Checksum as a parameter.

          Show
          Hairong Kuang added a comment - > Yeah, it is not used in ChecksumFS because it always seeks to checksum boundary. No, ChecksumFS supports seeking to an arbitrary position. It is done by first seeking to the closest checksum boundary and then seeking to the final position by reading (final postion - boundary position) number of bytes. Other comments: > final int readAvailable(); Not clear why we need this. > final setChecksum(Checksum sum); Do we allow to change Checksum at any time? Otherwise, I do not think that this is neededed. Instead, the generic class should have a constructor with Checksum as a parameter.
          Hide
          Raghu Angadi added a comment -

          You are right. setSkipBytes is not really required. In stead of calling setSkipBytes() DFS can just call read() or better skip() ( just like ChecksumFS does now). will remove setSkipBytes().

          Regd pread like readChunk(), I still need to think about it understand better. InputChecker() is not aware of any checksum boundaries or bytesPerChecksum currently.

          Show
          Raghu Angadi added a comment - You are right. setSkipBytes is not really required. In stead of calling setSkipBytes() DFS can just call read() or better skip() ( just like ChecksumFS does now). will remove setSkipBytes(). Regd pread like readChunk(), I still need to think about it understand better. InputChecker() is not aware of any checksum boundaries or bytesPerChecksum currently.
          Hide
          Doug Cutting added a comment -

          > It will be used by DFS since it might get few extra bytes when it seeks to a position in a new block.

          If, instead, readChunk and readChecksum were both pread-like, taking an absolute position as a parameter, then seek() could be generic and that method might not be needed, no? The generic seek would merely reset the stream's position and perform no actual i/o. The generic readChunk() would only be called at positions that are evenly divisible by bytesPerChecksum. Could that work? We need to support pread anyway, and should optimize sequential preads too, so this shouldn't make implementations any more complex.

          Show
          Doug Cutting added a comment - > It will be used by DFS since it might get few extra bytes when it seeks to a position in a new block. If, instead, readChunk and readChecksum were both pread-like, taking an absolute position as a parameter, then seek() could be generic and that method might not be needed, no? The generic seek would merely reset the stream's position and perform no actual i/o. The generic readChunk() would only be called at positions that are evenly divisible by bytesPerChecksum. Could that work? We need to support pread anyway, and should optimize sequential preads too, so this shouldn't make implementations any more complex.
          Hide
          Raghu Angadi added a comment -

          Thanks Doug.

          > I don't understand how setBytesToSkip() is used. It's not implemented in your example.
          Yeah, it is not used in ChecksumFS because it always seeks to checksum boundary. It will be used by DFS since it might get few extra bytes when it seeks to a position in a new block.

          Show
          Raghu Angadi added a comment - Thanks Doug. > I don't understand how setBytesToSkip() is used. It's not implemented in your example. Yeah, it is not used in ChecksumFS because it always seeks to checksum boundary. It will be used by DFS since it might get few extra bytes when it seeks to a position in a new block.
          Hide
          Doug Cutting added a comment -

          This looks mostly reasonable to me.

          I don't see that we need to handle varying the chunk size per-chunk, since we currently always fix it per file, and will require that the block size is a multiple of the bytesPerChecksum. Specifically, I don't see the need for the nextChunkSize() method, but rather only a getChunkSize() method.

          I don't understand how setBytesToSkip() is used. It's not implemented in your example.

          I would think that some of the logic of seek(long) might be generic, but perhaps not...

          Hairong, what do you think? Does this look workable to you?

          Show
          Doug Cutting added a comment - This looks mostly reasonable to me. I don't see that we need to handle varying the chunk size per-chunk, since we currently always fix it per file, and will require that the block size is a multiple of the bytesPerChecksum. Specifically, I don't see the need for the nextChunkSize() method, but rather only a getChunkSize() method. I don't understand how setBytesToSkip() is used. It's not implemented in your example. I would think that some of the logic of seek(long) might be generic, but perhaps not... Hairong, what do you think? Does this look workable to you?
          Hide
          Raghu Angadi added a comment -

          FYI, my vote (as a developer for this Jira) is for Aproach 1. But will do whatever you guys decide.

          Doug please comment when you get a chance. Thanks.

          Show
          Raghu Angadi added a comment - FYI, my vote (as a developer for this Jira) is for Aproach 1. But will do whatever you guys decide. Doug please comment when you get a chance. Thanks.
          Hide
          Raghu Angadi added a comment -

          InputChecker-01.java : This is similar to what I proposed above but without ReadInfo and InputChecker extends FSInputStream as Hairong proposed. It also includes use case for FSInputCheker in ChecksumFS.

          This InputChecker formalizes the condition that "readChunk()" and "readChecksum" will be called alternately.

          The second aproach is to take the existing FSInputChecker and use the code and since it calls datas.read() and sums.readInt() alternately anyway and implement DFSInputStream with that knowledge. So DFSInputStream will use one InputChecker for each of the blocks and provide same stream for datas and sums etc. I prefer first approach since it explicitly states this contract. I will attach a similar file for this second method. Also existing FSInputChecker requires that there is a buffer with size >= bytesPerChecksum between user and FSInputChecker for it to work.

          Please comment.

          Show
          Raghu Angadi added a comment - InputChecker-01.java : This is similar to what I proposed above but without ReadInfo and InputChecker extends FSInputStream as Hairong proposed. It also includes use case for FSInputCheker in ChecksumFS. This InputChecker formalizes the condition that "readChunk()" and "readChecksum" will be called alternately. The second aproach is to take the existing FSInputChecker and use the code and since it calls datas.read() and sums.readInt() alternately anyway and implement DFSInputStream with that knowledge. So DFSInputStream will use one InputChecker for each of the blocks and provide same stream for datas and sums etc. I prefer first approach since it explicitly states this contract. I will attach a similar file for this second method. Also existing FSInputChecker requires that there is a buffer with size >= bytesPerChecksum between user and FSInputChecker for it to work. Please comment.
          Hide
          Raghu Angadi added a comment - - edited

          Things that InputChecker should ideally do correctly ( may only be 'sort of' implemented if ever ) :

          • If there is a checksum mismatch and some bytes have already read successfully, then return whatever is read correctly till now. Handle/raise checksum error only in next read.
          • skip() is same as read() except that it never results in a checksum error (or checksum related retries).
          Show
          Raghu Angadi added a comment - - edited Things that InputChecker should ideally do correctly ( may only be 'sort of' implemented if ever ) : If there is a checksum mismatch and some bytes have already read successfully, then return whatever is read correctly till now. Handle/raise checksum error only in next read. skip() is same as read() except that it never results in a checksum error (or checksum related retries).
          Hide
          Raghu Angadi added a comment -

          I will summarize couple of approaches discussed. I was looking at the ChecksumFS to see how the approaches fit and noticed that current pread (read(position, buf, off, len)) silently returns unchecked data. Should I file a jira on this? Not sure if pread() is used anywhere.

          With block CRCs, pread() on DistributedFS works ok but in the code I have now does not deal with checksum errors properly (SeekToNewSource() messes up state). The new proposal should enable us to implement pread() without much effort both ChecksumFS and DistributedFS.

          Both preads can be fixed by locking the input stream and doing a '{{ seek(position); read(); seek(oldPos)}}'. But I think pread() is not supposed to block a sequential read(), and hopefully avoid reopening underlying file (in DFS case not do another open()).

          Show
          Raghu Angadi added a comment - I will summarize couple of approaches discussed. I was looking at the ChecksumFS to see how the approaches fit and noticed that current pread ( read(position, buf, off, len) ) silently returns unchecked data. Should I file a jira on this? Not sure if pread() is used anywhere. With block CRCs, pread() on DistributedFS works ok but in the code I have now does not deal with checksum errors properly (SeekToNewSource() messes up state). The new proposal should enable us to implement pread() without much effort both ChecksumFS and DistributedFS. Both preads can be fixed by locking the input stream and doing a '{{ seek(position); read(); seek(oldPos)}}'. But I think pread() is not supposed to block a sequential read(), and hopefully avoid reopening underlying file (in DFS case not do another open()).
          Hide
          Raghu Angadi added a comment - - edited

          Let me show what I had in my mind when I proposed "genericChecksum.java" above on Jun 8th: Main and only aim of it is to share the read and retry loops. readChunk() and readChecksum are not public interfaces in the sense, map/reduce never invokes it. FSInputStream public interface does not change.

          psuedo code of the class and how it is used. :

          inputChecker.java
          
          //First the use case:
          
          FSInputChecker in ChecksumFileSystem would look like this:
          
          class FSInputChecker extends FSInputStream  {
          
            InputCheker checker = new InputChecker(this);
          
             public int read(buf, offset, len) {
                 return checker.read(buf, offset, len);
             }
             
             implement readChunk();
             implement readChecksum() ;
          
             seek etc slightly modify.
          }
          
          
          
          class InputChecker {
           
          /// FSInputChecker supports readChunk() and readChecksum() as described in
          // my comment
          InputChecker(FSInputChecker in);
          
          ReadInfo readInfo;
          FSInputStream in;
          
          // contract is same as inputStrem.read().
          // following is too simplified. But real implementation will look a bit more 
          //involved but  that meat is shared across the file systems, which I think is
          // what this Jira wants.
          
          int read(byte[] userBuf, int offset, int len) {
            int bytesRead = 0
            while ( bytesRead < len )  {
                if ( there is data left in readInfo ) {
                   copy to userBuf;
               } else {
                 RETRY_LOOP {
                     in.readChunk().
                     in.readChecksum();
                    compare the checksum
                    If there are mismatches, in.seekToNewSource() etc
                }
            }
          }
          
          resetReadInfo() { readInfo = 0; }
          
          }
          

          Hope this makes it clear where InputChecker I have in mind fits in. Note that here FSInputChecker in ChecksumFileSystem and DFSInputStream in DFS, implement readChunk() and reacChecksum() internally.

          edit: reduced the "width" of the code

          Show
          Raghu Angadi added a comment - - edited Let me show what I had in my mind when I proposed "genericChecksum.java" above on Jun 8th: Main and only aim of it is to share the read and retry loops. readChunk() and readChecksum are not public interfaces in the sense, map/reduce never invokes it. FSInputStream public interface does not change. psuedo code of the class and how it is used. : inputChecker.java //First the use case : FSInputChecker in ChecksumFileSystem would look like this : class FSInputChecker extends FSInputStream { InputCheker checker = new InputChecker( this ); public int read(buf, offset, len) { return checker.read(buf, offset, len); } implement readChunk(); implement readChecksum() ; seek etc slightly modify. } class InputChecker { /// FSInputChecker supports readChunk() and readChecksum() as described in // my comment InputChecker(FSInputChecker in); ReadInfo readInfo; FSInputStream in; // contract is same as inputStrem.read(). // following is too simplified. But real implementation will look a bit more //involved but that meat is shared across the file systems, which I think is // what this Jira wants. int read( byte [] userBuf, int offset, int len) { int bytesRead = 0 while ( bytesRead < len ) { if ( there is data left in readInfo ) { copy to userBuf; } else { RETRY_LOOP { in.readChunk(). in.readChecksum(); compare the checksum If there are mismatches, in.seekToNewSource() etc } } } resetReadInfo() { readInfo = 0; } } Hope this makes it clear where InputChecker I have in mind fits in. Note that here FSInputChecker in ChecksumFileSystem and DFSInputStream in DFS, implement readChunk() and reacChecksum() internally. edit: reduced the "width" of the code
          Hide
          Sameer Paranjpye added a comment -

          From what I've seen of the implementations of FSInputChecker and FSOutputSummer it looks like they can be very simply tweaked to handle the two cases of

          a) parallel data and checksum streams and
          b) a single stream with data and checksums interleaved

          they can already handle the end-of-stream case where the number of bytes to be checksummed is <= bytesPerChecksum

          why not just tweak the existing implementations and make them capable of handling a single stream with data and checksums interleaved?

          The implementation could then be used by both ChecksumFilesystem the DistributedFilesystem.

          DFSInputStream would then instantiate a DFSBlockStream (a new class) wrap it in an FSInputChecker and commence reading, it would re-initialize the input checker with a new DFSBlockStream when the end of a block was reached or when seek was called, the 'pread' case would need to be handled carefully.

          I don't see the need for methods to read data and checksums independently of each other.

          Show
          Sameer Paranjpye added a comment - From what I've seen of the implementations of FSInputChecker and FSOutputSummer it looks like they can be very simply tweaked to handle the two cases of a) parallel data and checksum streams and b) a single stream with data and checksums interleaved they can already handle the end-of-stream case where the number of bytes to be checksummed is <= bytesPerChecksum why not just tweak the existing implementations and make them capable of handling a single stream with data and checksums interleaved? The implementation could then be used by both ChecksumFilesystem the DistributedFilesystem. DFSInputStream would then instantiate a DFSBlockStream (a new class) wrap it in an FSInputChecker and commence reading, it would re-initialize the input checker with a new DFSBlockStream when the end of a block was reached or when seek was called, the 'pread' case would need to be handled carefully. I don't see the need for methods to read data and checksums independently of each other.
          Hide
          Hairong Kuang added a comment -

          Having Checker extend FSInputStream does not mean that an input stream can not be a member of Checker. But anyway i will have Checker implements Seekable, PositionReadable.

          Show
          Hairong Kuang added a comment - Having Checker extend FSInputStream does not mean that an input stream can not be a member of Checker. But anyway i will have Checker implements Seekable, PositionReadable.
          Hide
          Raghu Angadi added a comment -

          Just like sum, input stream could be a member of Checker instead of extending it.

          Show
          Raghu Angadi added a comment - Just like sum , input stream could be a member of Checker instead of extending it.
          Hide
          Hairong Kuang added a comment -

          When readVerifiedChunk handles ChecksumError, it needs to getPos, seek, seekToNewSource, and reportChecksumError, most of which are part of FSInputStream interface. That's why I let the class extend FSInputStream. Otherwise, we need to list them as a part of the Checker interface.

          Show
          Hairong Kuang added a comment - When readVerifiedChunk handles ChecksumError, it needs to getPos, seek, seekToNewSource, and reportChecksumError, most of which are part of FSInputStream interface. That's why I let the class extend FSInputStream. Otherwise, we need to list them as a part of the Checker interface.
          Hide
          Raghu Angadi added a comment -

          > This is just a utility. Any one who wants use this can implement this.
          I meant, this could be a utility.

          Show
          Raghu Angadi added a comment - > This is just a utility. Any one who wants use this can implement this. I meant, this could be a utility.
          Hide
          Raghu Angadi added a comment - - edited

          The code Hairong included in the prev comment :

          genericChecksums2.java
          abstract class Checker extends FSInputStream {
          Checksum sum;
          
          /** read a checksum verified chunk into buf at offset
            * If there is a Checksum error, it retries a different replica
            * If the buf does not have enough space to hold the chunk, throws an
            *  IllegalArgumentException
            * returns the number of bytes read; -1 if end of file
            */
          int readVerifiedChunk( byte[] buf, int offset );
          
          /** read a chunk into buf at offset.
            * It ends at the a checksum boundary or end of file or at any pos which could be 
             * checksum verified,  for example, at a block boundary in block-level-crc dfs
            * data read have not been checksum verified
            * If the buf does not have enough space to hold the chunk, throws an
            *  IllegalArgumentException
            * returns the number of bytes read; -1 if end of file
            */
          abstract int readChunk( byte[] buf, int offset, int len) throws IOException;
          
          /** read a checksum */
          abstract long readChecksum() throws IOException;
          }
          

          This sounds fine. Except that that we don't need to extend FSInputStream. This is just a utility. Any one who wants use this can implement this.

          Show
          Raghu Angadi added a comment - - edited The code Hairong included in the prev comment : genericChecksums2.java abstract class Checker extends FSInputStream { Checksum sum; /** read a checksum verified chunk into buf at offset * If there is a Checksum error, it retries a different replica * If the buf does not have enough space to hold the chunk, throws an * IllegalArgumentException * returns the number of bytes read; -1 if end of file */ int readVerifiedChunk( byte [] buf, int offset ); /** read a chunk into buf at offset. * It ends at the a checksum boundary or end of file or at any pos which could be * checksum verified, for example, at a block boundary in block-level-crc dfs * data read have not been checksum verified * If the buf does not have enough space to hold the chunk, throws an * IllegalArgumentException * returns the number of bytes read; -1 if end of file */ abstract int readChunk( byte [] buf, int offset, int len) throws IOException; /** read a checksum */ abstract long readChecksum() throws IOException; } This sounds fine. Except that that we don't need to extend FSInputStream . This is just a utility. Any one who wants use this can implement this.
          Hide
          Hairong Kuang added a comment - - edited

          Chatting with Raghu again. Hopefully we can reach a common ground.
          1. We need a method to read a chunk without checksum verification.
          2. I do not like the idea of passing both a local buffer and a user buffer to each read. Instead, I'd like to just pass a buffer. Either it is a local buffer or a user buffer is up to each implementation to decide it.
          3. For the readChecksum() API, can we let it return a long for now?

          So the propsed reading interface is like the following:

          abstract class Checker extends FSInputStream {
          Checksum sum;
          
          /** read a checksum verified chunk into buf at offset
            * If there is a Checksum error, it retries a different replica
            * If the buf does not have enough space to hold the chunk, throws an IllegalArgumentException
            * returns the number of bytes read; -1 if end of file
            */
          int readVerifiedChunk( byte[] buf, int offset );
          
          /** read a chunk into buf at offset.
            * It ends at the a checksum boundary or end of file or at any pos which could be checksum verified, 
            * for example, at a block boundary in block-level-crc dfs
            * data read have not been checksum verified
            * If the buf does not have enough space to hold the chunk, throws an IllegalArgumentException
            * returns the number of bytes read; -1 if end of file
            */
          abstract int readChunk( byte[] buf, int offset, int len) throws IOException;
          
          /** read a checksum */
          abstract long readChecksum() throws IOException;
          }
          
          Show
          Hairong Kuang added a comment - - edited Chatting with Raghu again. Hopefully we can reach a common ground. 1. We need a method to read a chunk without checksum verification. 2. I do not like the idea of passing both a local buffer and a user buffer to each read. Instead, I'd like to just pass a buffer. Either it is a local buffer or a user buffer is up to each implementation to decide it. 3. For the readChecksum() API, can we let it return a long for now? So the propsed reading interface is like the following: abstract class Checker extends FSInputStream { Checksum sum; /** read a checksum verified chunk into buf at offset * If there is a Checksum error, it retries a different replica * If the buf does not have enough space to hold the chunk, throws an IllegalArgumentException * returns the number of bytes read; -1 if end of file */ int readVerifiedChunk( byte [] buf, int offset ); /** read a chunk into buf at offset. * It ends at the a checksum boundary or end of file or at any pos which could be checksum verified, * for example, at a block boundary in block-level-crc dfs * data read have not been checksum verified * If the buf does not have enough space to hold the chunk, throws an IllegalArgumentException * returns the number of bytes read; -1 if end of file */ abstract int readChunk( byte [] buf, int offset, int len) throws IOException; /** read a checksum */ abstract long readChecksum() throws IOException; }
          Hide
          Raghu Angadi added a comment -

          There might be some more extensions needed for ReadInfo. E.g. it needs a way of giving some data to InputChecker that needs to used before the user buffer for checksum verification.

          Show
          Raghu Angadi added a comment - There might be some more extensions needed for ReadInfo . E.g. it needs a way of giving some data to InputChecker that needs to used before the user buffer for checksum verification.
          Hide
          Raghu Angadi added a comment -

          Also it does not suite well for async reading/polling. We can provide that by having an extra field in ReadInfo. In that sense, it can be extended.

          Show
          Raghu Angadi added a comment - Also it does not suite well for async reading/polling. We can provide that by having an extra field in ReadInfo. In that sense, it can be extended.
          Hide
          Raghu Angadi added a comment -

          I don't want to claim that above a as generic as one might prefer. We can certainly have a better one. I just think above suites better than two separate but tightly coupled (by single bpc) streams.

          Show
          Raghu Angadi added a comment - I don't want to claim that above a as generic as one might prefer. We can certainly have a better one. I just think above suites better than two separate but tightly coupled (by single bpc ) streams.
          Hide
          Raghu Angadi added a comment -

          I prefer readData ( or readChunk) and readChecksum(). Contract is that amount returned in readChunk is what need to be verified. Also readChunk() can provide the buffer it has (usually user buffer) and if chunk does not fit in it, the implementation of readChunk will use its local buffer. This also avoids extra copy.

          I think inputChecker is the more complex one. This does not easily suite outputSummer and outputSummer is pretty simple. If we still want to share the outputSummer, then we could add writeAvailable() call before it checksums the data.

          reader interface looks like this (not compiled) :

          genericChecksums.java
           
          class ReadInfo {
             byte[] userBuffer;
             int  userOffset;
             int userLen;
             
             int readLen; // -1 indicates same as InputStream.read() returning -1;
             byte[] localBuf; // null if readLen <= userLen 
             byte[] localOffset; 
             int localLen; // localLen + userLen
          }
          
          /*
           *  Verifies the checksum of the returned data. It handles the checksums errors and retries. Number of retries can be controlled by 
           *  the implementation.
           * Input checker gaurantees that readChunk() and readChecksum() are called in sequence.
           * If info.localLen > 0, next readChunk() will be called only after localLen of data has been consumed
           */ 
          int readChunk( ReadInfo info ); 
          
          /** Number of bytes returned depends on "Type of Checksum" provided during creation of inputchecker. (Similar to DataChecksum class in HADOOP-1134)
           * When checksum size is zero (or checksum object is "null"), implementation could decide to ignore or retry.
           * How do we support "stopSumming()" in current ChecksumFS? May be return of -1 could indicate that.
           */
          int readChecksum(byte[] buf, int offset);
           
          /// Write interface. If we really need this? 
          
          int writeChunkAvailable(); // called when checksum is reset. Will call writeChecksum() after writing this many bytes or when it is closed.
          int write(buf, offset, len); // gaurantees that it obeys chunkAvailable() above.
          int writeChecksum(); // called just before ChunkAvailable().
          }
          
          Show
          Raghu Angadi added a comment - I prefer readData ( or readChunk) and readChecksum(). Contract is that amount returned in readChunk is what need to be verified. Also readChunk() can provide the buffer it has (usually user buffer) and if chunk does not fit in it, the implementation of readChunk will use its local buffer. This also avoids extra copy. I think inputChecker is the more complex one. This does not easily suite outputSummer and outputSummer is pretty simple. If we still want to share the outputSummer, then we could add writeAvailable() call before it checksums the data. reader interface looks like this (not compiled) : genericChecksums.java class ReadInfo { byte [] userBuffer; int userOffset; int userLen; int readLen; // -1 indicates same as InputStream.read() returning -1; byte [] localBuf; // null if readLen <= userLen byte [] localOffset; int localLen; // localLen + userLen } /* * Verifies the checksum of the returned data. It handles the checksums errors and retries. Number of retries can be controlled by * the implementation. * Input checker gaurantees that readChunk() and readChecksum() are called in sequence. * If info.localLen > 0, next readChunk() will be called only after localLen of data has been consumed */ int readChunk( ReadInfo info ); /** Number of bytes returned depends on "Type of Checksum" provided during creation of inputchecker. (Similar to DataChecksum class in HADOOP-1134) * When checksum size is zero (or checksum object is " null " ), implementation could decide to ignore or retry. * How do we support "stopSumming()" in current ChecksumFS? May be return of -1 could indicate that. */ int readChecksum( byte [] buf, int offset); /// Write interface . If we really need this ? int writeChunkAvailable(); // called when checksum is reset. Will call writeChecksum() after writing this many bytes or when it is closed. int write(buf, offset, len); // gaurantees that it obeys chunkAvailable() above. int writeChecksum(); // called just before ChunkAvailable(). }
          Hide
          Hairong Kuang added a comment -

          Chatting with Raghu and understand his concern. He does not like the idea of separating checksums and data using two streams. I am thinking to remove these two streams. In stead I will have two abstract methods readData/readCheckSum in Checker and two abstract writeData/writeCheckSum in Summer. Then ChecksumFileSystem and the block-level-crc DFS can choose to implement those methods either by using two streams or one stream.

          Show
          Hairong Kuang added a comment - Chatting with Raghu and understand his concern. He does not like the idea of separating checksums and data using two streams. I am thinking to remove these two streams. In stead I will have two abstract methods readData/readCheckSum in Checker and two abstract writeData/writeCheckSum in Summer. Then ChecksumFileSystem and the block-level-crc DFS can choose to implement those methods either by using two streams or one stream.
          Hide
          Raghu Angadi added a comment -

          > if you think that model is broken, propose another.

          I am more than happy to propose even implement it and use them in ChecksumFS and DFS.

          I have been waiting/hoping for some one to justify why they think current approach is a good idea. I asked about it in many comments above. I think the new proposal would be even better once I have more experience with HADOOP-1134.

          Show
          Raghu Angadi added a comment - > if you think that model is broken, propose another. I am more than happy to propose even implement it and use them in ChecksumFS and DFS. I have been waiting/hoping for some one to justify why they think current approach is a good idea. I asked about it in many comments above. I think the new proposal would be even better once I have more experience with HADOOP-1134 .
          Hide
          Doug Cutting added a comment -

          > I don't agree with 'it is good enough until it is extremely difficult to use'

          Who has argued for that? We're just trying to avoid duplicating some subtle logic. I don't think you've demonstrated that using a generic checksummer will make things significantly more complex in HDFS. I don't see that it adds significant complexity to make the data and checksums appear as separate input streams. Or, if you think that model is broken, propose another.

          > I surely don't think it is an improvement over FSInputChecker which, I think, was not explicitly designed to be general purpose checker.

          So what are you arguing? That it's a bad general purpose checker? If so, then propose an improvement. Or that it's impossible to create a general purpose checker that's easy for you to use? That seems unlikely, but we'll only find out by trying, no?

          Show
          Doug Cutting added a comment - > I don't agree with 'it is good enough until it is extremely difficult to use' Who has argued for that? We're just trying to avoid duplicating some subtle logic. I don't think you've demonstrated that using a generic checksummer will make things significantly more complex in HDFS. I don't see that it adds significant complexity to make the data and checksums appear as separate input streams. Or, if you think that model is broken, propose another. > I surely don't think it is an improvement over FSInputChecker which, I think, was not explicitly designed to be general purpose checker. So what are you arguing? That it's a bad general purpose checker? If so, then propose an improvement. Or that it's impossible to create a general purpose checker that's easy for you to use? That seems unlikely, but we'll only find out by trying, no?
          Hide
          Hairong Kuang added a comment -

          Raghu, the more I think about it, it makes sense to have generic classes to handle checksum stuff. Then you only need to focus on client to datanode protocol changes in HADOOP-1134. To me the keys to make it to work are:
          1. As Doug asked, can you present the data and checksums as input streams that support read(byte[], int, int) and seek(long)?
          2. How does the generic Checker read files in block-level-crc DFS whose block size is not a multiple of bytesPerSum? In this case, it also needs to verify checksum at block boundaries. Shall we expose block boundaries to FileSystem?

          Show
          Hairong Kuang added a comment - Raghu, the more I think about it, it makes sense to have generic classes to handle checksum stuff. Then you only need to focus on client to datanode protocol changes in HADOOP-1134 . To me the keys to make it to work are: 1. As Doug asked, can you present the data and checksums as input streams that support read(byte[], int, int) and seek(long)? 2. How does the generic Checker read files in block-level-crc DFS whose block size is not a multiple of bytesPerSum? In this case, it also needs to verify checksum at block boundaries. Shall we expose block boundaries to FileSystem?
          Hide
          Raghu Angadi added a comment -

          > could you use the above?
          can I? Yeah sure, my argument has never been it can not be done. We can even rechecksum to hide mismatch between bpc and blockSize.

          Though I don't agree with 'it is good enough until it is extremely difficult to use', whether it is implied here or not. I surely don't think it is an improvement over FSInputChecker which, I think, was not explicitly designed to be general purpose checker.

          I sure hope this is not a blocker for HADOOP-1134.

          Show
          Raghu Angadi added a comment - > could you use the above? can I? Yeah sure, my argument has never been it can not be done. We can even rechecksum to hide mismatch between bpc and blockSize . Though I don't agree with 'it is good enough until it is extremely difficult to use', whether it is implied here or not. I surely don't think it is an improvement over FSInputChecker which, I think, was not explicitly designed to be general purpose checker. I sure hope this is not a blocker for HADOOP-1134 .
          Hide
          Doug Cutting added a comment -

          > Do you think attached patch is generic?

          Nothing is generic until it's used in more than one place. The above isn't complete (it needs, e.g., 'read()' and 'seek()' implementations) but it looks like a good start. But the real question is to you: could you use the above? Can you present the data and checksums as input streams that support read(byte[], int, int) and seek(long)?

          It also assumes a particular checksum implementation, CRC32. If we wish to allow for others, that aspect could be generalized by adding a codec-like interface for checksummers. But I think we should probably skip that this iteration.

          Show
          Doug Cutting added a comment - > Do you think attached patch is generic? Nothing is generic until it's used in more than one place. The above isn't complete (it needs, e.g., 'read()' and 'seek()' implementations) but it looks like a good start. But the real question is to you: could you use the above? Can you present the data and checksums as input streams that support read(byte[], int, int) and seek(long)? It also assumes a particular checksum implementation, CRC32. If we wish to allow for others, that aspect could be generalized by adding a codec-like interface for checksummers. But I think we should probably skip that this iteration.
          Hide
          Raghu Angadi added a comment -

          Doug, I personally like the condition that block size should be a multiple of bpc. I even filed HADOOP-1259 about it. I guess discussion there is still relevant.

          But my point is that this generic InputChecker forces a condition that does not make sense for either of its users (ChecksumFS and DFS). This is just one example of things that imposes. The current uploaded patch even takes 'blockSize' for output summer, though not yet used.

          Do you think attached patch is generic?

          Show
          Raghu Angadi added a comment - Doug, I personally like the condition that block size should be a multiple of bpc . I even filed HADOOP-1259 about it. I guess discussion there is still relevant. But my point is that this generic InputChecker forces a condition that does not make sense for either of its users (ChecksumFS and DFS). This is just one example of things that imposes. The current uploaded patch even takes 'blockSize' for output summer, though not yet used. Do you think attached patch is generic?
          Hide
          Doug Cutting added a comment -

          > changing block size during upgrade. that will surely not be a picnic.

          I don't think that should be required. Are there actually any large data sets in HDFS whose blockSize is not a multiple of its bytesPerChecksum? I'd be surprised if many if any have configured things this way. If we do encounter this somewhere then we can either (a) discard the checksums; or (b) recompute the checksums (first validating using the old). I think this is an edge case that we need not invest too much effort in.

          Show
          Doug Cutting added a comment - > changing block size during upgrade. that will surely not be a picnic. I don't think that should be required. Are there actually any large data sets in HDFS whose blockSize is not a multiple of its bytesPerChecksum? I'd be surprised if many if any have configured things this way. If we do encounter this somewhere then we can either (a) discard the checksums; or (b) recompute the checksums (first validating using the old). I think this is an edge case that we need not invest too much effort in.
          Hide
          Hairong Kuang added a comment -

          This is an initial patch for review. It assumes that block size is a multiple of bytesPerSum and contains checksum generation/verification and checksum error handling. Do we need any other funcationality? I still need to work on ChecksumFileSystem so that it does not assume that it is a wrapper of a raw fs.

          Show
          Hairong Kuang added a comment - This is an initial patch for review. It assumes that block size is a multiple of bytesPerSum and contains checksum generation/verification and checksum error handling. Do we need any other funcationality? I still need to work on ChecksumFileSystem so that it does not assume that it is a wrapper of a raw fs.
          Hide
          Raghu Angadi added a comment -

          > For backward compatability, could we enforce this during current dfs -> block-level-crc dfs upgrade?
          hmm.. changing block size during upgrade. that will surely not be a picnic.

          Show
          Raghu Angadi added a comment - > For backward compatability, could we enforce this during current dfs -> block-level-crc dfs upgrade? hmm.. changing block size during upgrade. that will surely not be a picnic.
          Hide
          Raghu Angadi added a comment -

          This does not fix existing data. I have written quite a bit of code in Block-Level CRCs Upgrade to handle this case well.

          Looks to me, we might be subconsciously trying to fit InputChecker to match what FSInputChecker already does in ChecksumFS. But I was under the impression that we want to have a generic InputStream.

          Show
          Raghu Angadi added a comment - This does not fix existing data. I have written quite a bit of code in Block-Level CRCs Upgrade to handle this case well. Looks to me, we might be subconsciously trying to fit InputChecker to match what FSInputChecker already does in ChecksumFS. But I was under the impression that we want to have a generic InputStream.
          Hide
          Hairong Kuang added a comment -

          > Finally, I think it's okay to throw an exception in the client when the configured blocksize is not a multiple of the configured bytesPerSum. So, if we think it will considerably simplify implementation, I don't see a problem with adding this restriction.

          Yes, I agree that this restriction would greatly simplify this issue. For backward compatability, could we enforce this during current dfs -> block-level-crc dfs upgrade?

          Show
          Hairong Kuang added a comment - > Finally, I think it's okay to throw an exception in the client when the configured blocksize is not a multiple of the configured bytesPerSum. So, if we think it will considerably simplify implementation, I don't see a problem with adding this restriction. Yes, I agree that this restriction would greatly simplify this issue. For backward compatability, could we enforce this during current dfs -> block-level-crc dfs upgrade?
          Hide
          Raghu Angadi added a comment -

          readBuffer.java is tested in my dev environment. Checksumming closest to User I think can be solved independently. I agree that it is very important and will be solved for HADOOP-1134.

          Some more considerations for InputChecker design:

          If purpose of this Jira is to provide a generic InputChecker, I wonder why 2 streams (where it imposes that every checksum block be of the same size, for eg.) is very generic. Does it support different types of checksums (DFS already uses two types)? Are we confident this serves future FS well? Or is it ok to modify InputChecker (probably all the existing FS'es) then?

          When InputChecker reads 4k (say by some equivalent of readFully()), should DFS necessarily read 256 MB of block data in the case where bytesPerChecksum is 64K to solve that? Of course no. This is just one example of various things DFS support for this InputChecker needs to handle.

          I have probably thought more than I should about this . Back to HADOOP-1134.

          Show
          Raghu Angadi added a comment - readBuffer.java is tested in my dev environment. Checksumming closest to User I think can be solved independently. I agree that it is very important and will be solved for HADOOP-1134 . Some more considerations for InputChecker design: If purpose of this Jira is to provide a generic InputChecker, I wonder why 2 streams (where it imposes that every checksum block be of the same size, for eg.) is very generic. Does it support different types of checksums (DFS already uses two types)? Are we confident this serves future FS well? Or is it ok to modify InputChecker (probably all the existing FS'es) then? When InputChecker reads 4k (say by some equivalent of readFully()), should DFS necessarily read 256 MB of block data in the case where bytesPerChecksum is 64K to solve that? Of course no. This is just one example of various things DFS support for this InputChecker needs to handle. I have probably thought more than I should about this . Back to HADOOP-1134 .
          Hide
          Doug Cutting added a comment -

          Raghu, I think it is worth trying to keep as much as is reasonable generic, and no more. If it indeed makes things considerably more complex to share the code, as you fear, then we probably should not. But we should first try, no?

          I'm not convinced that your ReadBuffer.java really handles all of the cases that might arise--it's pseudo code that hasn't been tested yet. Things sometimes have a way of getting more complicated as they're used and debugged. It doesn't even show the checksumming, just handling of one exception and retry. It also doesn't support an important property that we desire, where checksums are verified as late as possible, so that we catch more memory errors, not just disk errors. So I don't really see how it illustrates the point I think you intend, that a non-generic version will be simpler.

          I'm also not sure what your analogy is with HADOOP-1425. In that case some folks don't like subclassing, and prefer an interface and static methods. That's fine, and we could do that here if you think such a design would be cleaner. But no one is arguing there that we shouldn't share as much logic as possible, rather that discussion was about how the logic is shared.

          Finally, I think it's okay to throw an exception in the client when the configured blocksize is not a multiple of the configured bytesPerSum. So, if we think it will considerably simplify implementation, I don't see a problem with adding this restriction.

          Show
          Doug Cutting added a comment - Raghu, I think it is worth trying to keep as much as is reasonable generic, and no more. If it indeed makes things considerably more complex to share the code, as you fear, then we probably should not. But we should first try, no? I'm not convinced that your ReadBuffer.java really handles all of the cases that might arise--it's pseudo code that hasn't been tested yet. Things sometimes have a way of getting more complicated as they're used and debugged. It doesn't even show the checksumming, just handling of one exception and retry. It also doesn't support an important property that we desire, where checksums are verified as late as possible, so that we catch more memory errors, not just disk errors. So I don't really see how it illustrates the point I think you intend, that a non-generic version will be simpler. I'm also not sure what your analogy is with HADOOP-1425 . In that case some folks don't like subclassing, and prefer an interface and static methods. That's fine, and we could do that here if you think such a design would be cleaner. But no one is arguing there that we shouldn't share as much logic as possible, rather that discussion was about how the logic is shared. Finally, I think it's okay to throw an exception in the client when the configured blocksize is not a multiple of the configured bytesPerSum. So, if we think it will considerably simplify implementation, I don't see a problem with adding this restriction.
          Hide
          Raghu Angadi added a comment -

          May be two streams for InputChecker is not very generic. May be it could be single stream or not a stream at all but discrete chunks. I have a feeling any generic method will end up requiring extra mem copy of data for atleast some of the implementations of FS.

          Show
          Raghu Angadi added a comment - May be two streams for InputChecker is not very generic. May be it could be single stream or not a stream at all but discrete chunks. I have a feeling any generic method will end up requiring extra mem copy of data for atleast some of the implementations of FS.
          Hide
          Raghu Angadi added a comment - - edited

          From my understanding this seems to have similarities with discussion about ToolBase in HADOOP-1425 where ToolBase imposes a specific structure on implementation in order to provide an utility (in this case comparing checksums and retries). Of course ToolBase is a very simple case.

          Edit: Changed Jira number.

          Show
          Raghu Angadi added a comment - - edited From my understanding this seems to have similarities with discussion about ToolBase in HADOOP-1425 where ToolBase imposes a specific structure on implementation in order to provide an utility (in this case comparing checksums and retries). Of course ToolBase is a very simple case. Edit: Changed Jira number.
          Hide
          Raghu Angadi added a comment -

          > In the block-level-crc dfs, do we allow different values of bytesPerSum for blocks in a file?
          Yes. Though there is no way to do this now. Also we don't enforce that each block to have same bps for a file. So theoretical yes.

          > Do we allow different block size for blocks in a file?
          No. Just like current DFS. i.e. this is not a block-level-crc specific issue.

          If data are checksumed at the FileSystem level, there is another complexity in the block-level-crc dfs. When a block size is not a multiple of bytesPerSum, we also need to output/verify checksum at the end of each block. So it is harder to decide checksumChunk boundaries.

          good point. Yes. This will further influence ChecksumFileSystem implementation. This inter-(dependency and influence) between ChecksumFS and DFS could potentially further increase with this jira,

          Show
          Raghu Angadi added a comment - > In the block-level-crc dfs, do we allow different values of bytesPerSum for blocks in a file? Yes. Though there is no way to do this now. Also we don't enforce that each block to have same bps for a file. So theoretical yes. > Do we allow different block size for blocks in a file? No. Just like current DFS. i.e. this is not a block-level-crc specific issue. If data are checksumed at the FileSystem level, there is another complexity in the block-level-crc dfs. When a block size is not a multiple of bytesPerSum, we also need to output/verify checksum at the end of each block. So it is harder to decide checksumChunk boundaries. good point. Yes. This will further influence ChecksumFileSystem implementation. This inter-(dependency and influence) between ChecksumFS and DFS could potentially further increase with this jira,
          Hide
          Hairong Kuang added a comment -

          In the block-level-crc dfs, do we allow different values of bytesPerSum for blocks in a file? Do we allow different block size for blocks in a file?

          If data are checksumed at the FileSystem level, there is another complexity in the block-level-crc dfs. When a block size is not a multiple of bytesPerSum, we also need to output/verify checksum at the end of each block. So it is harder to decide checksumChunk boundaries.

          Show
          Hairong Kuang added a comment - In the block-level-crc dfs, do we allow different values of bytesPerSum for blocks in a file? Do we allow different block size for blocks in a file? If data are checksumed at the FileSystem level, there is another complexity in the block-level-crc dfs. When a block size is not a multiple of bytesPerSum, we also need to output/verify checksum at the end of each block. So it is harder to decide checksumChunk boundaries.
          Hide
          Raghu Angadi added a comment -

          ReadBuffer.java's purpose was not to show how checksumming can be handled in general by a small function. IMO it was to show that handling DFS internal checksum errors internally is not difficult. The source patch attached to 1134 handles rest of checksum verification and creation that is internal to DFS.

          Show
          Raghu Angadi added a comment - ReadBuffer.java's purpose was not to show how checksumming can be handled in general by a small function. IMO it was to show that handling DFS internal checksum errors internally is not difficult. The source patch attached to 1134 handles rest of checksum verification and creation that is internal to DFS.
          Hide
          Hairong Kuang added a comment - - edited

          The file ReadBuffer.java that Raghu submitted to HADOOP-1134 shows only the ChecksumException handling code. How about the checksum verification and checksum generation parts of code? I think they should also belong to the generic classes.

          Let's suppose that the genric classes are Checker for reading and Summer for writing. Should each of these two classes contains two streams, one for data and one for checksum? Should these two streams be an abstraction of a block or a file?

          Show
          Hairong Kuang added a comment - - edited The file ReadBuffer.java that Raghu submitted to HADOOP-1134 shows only the ChecksumException handling code. How about the checksum verification and checksum generation parts of code? I think they should also belong to the generic classes. Let's suppose that the genric classes are Checker for reading and Summer for writing. Should each of these two classes contains two streams, one for data and one for checksum? Should these two streams be an abstraction of a block or a file?
          Hide
          Hairong Kuang added a comment -

          I am moving Raghu and Doug's discussion here since they are more relevant to this issue:

          Raghu Angadi - [06/Jun/07 02:23 PM ]
          Attaching an implementation of readBuffer() that handles the retries similar to ChecksumFileSystem. I am planning to use this in my development. IMHO complexity of this should be compared with what is required for HADOOP-1470 (both under fs and dfs).

          [ Doug Cutting - [06/Jun/07 03:00 PM ]
          > complexity of this should be compared with what is required for HADOOP-1470
          Or perhaps this can be used as a template for the generic version. The only DFS-specific bits are in the catch clause, and they could be factored into an abstract method, no?

          Show
          Hairong Kuang added a comment - I am moving Raghu and Doug's discussion here since they are more relevant to this issue: Raghu Angadi - [06/Jun/07 02:23 PM ] Attaching an implementation of readBuffer() that handles the retries similar to ChecksumFileSystem. I am planning to use this in my development. IMHO complexity of this should be compared with what is required for HADOOP-1470 (both under fs and dfs). [ Doug Cutting - [06/Jun/07 03:00 PM ] > complexity of this should be compared with what is required for HADOOP-1470 Or perhaps this can be used as a template for the generic version. The only DFS-specific bits are in the catch clause, and they could be factored into an abstract method, no?

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development