Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-127

DFSClient block read failures cause open DFSInputStream to become unusable

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: hdfs-client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We are using some Lucene indexes directly from HDFS and for quite long time we were using Hadoop version 0.15.3.

      When tried to upgrade to Hadoop 0.19 - index searches started to fail with exceptions like:
      2008-11-13 16:50:20,314 WARN [Listener-4] [] DFSClient : DFS Read: java.io.IOException: Could not obtain block: blk_5604690829708125511_15489 file=/usr/collarity/data/urls-new/part-00000/20081110-163426/_0.tis
      at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1708)
      at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1536)
      at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1663)
      at java.io.DataInputStream.read(DataInputStream.java:132)
      at org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:174)
      at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:152)
      at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
      at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:76)
      at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:63)
      at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:131)
      at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:162)
      at org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:223)
      at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:217)
      at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:54)
      ...

      The investigation showed that the root of this issue is that we exceeded # of xcievers in the data nodes and that was fixed by changing configuration settings to 2k.
      However - one thing that bothered me was that even after datanodes recovered from overload and most of client servers had been shut down - we still observed errors in the logs of running servers.
      Further investigation showed that fix for HADOOP-1911 introduced another problem - the DFSInputStream instance might become unusable once number of failures over lifetime of this instance exceeds configured threshold.

      The fix for this specific issue seems to be trivial - just reset failure counter before reading next block (patch will be attached shortly).

      This seems to be also related to HADOOP-3185, but I'm not sure I really understand necessity of keeping track of failed block accesses in the DFS client.

      1. 4681.patch
        0.8 kB
        Igor Bolotin
      2. h127_20091016.patch
        4 kB
        Tsz Wo Nicholas Sze
      3. h127_20091019.patch
        1 kB
        Tsz Wo Nicholas Sze
      4. h127_20091019b.patch
        0.8 kB
        Tsz Wo Nicholas Sze
      5. hdfs-127-branch20-redone.txt
        13 kB
        Todd Lipcon
      6. hdfs-127-branch20-redone-v2.txt
        13 kB
        Todd Lipcon
      7. hdfs-127-regression-test.txt
        3 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Igor Bolotin added a comment -

          Patch for 0.19 attached

          Show
          Igor Bolotin added a comment - Patch for 0.19 attached
          Hide
          Igor Bolotin added a comment -

          The patch seems to be applicable both for trunk and for 0.19 branch.

          Show
          Igor Bolotin added a comment - The patch seems to be applicable both for trunk and for 0.19 branch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12394193/4681.patch
          against trunk revision 719431.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3621/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3621/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3621/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3621/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12394193/4681.patch against trunk revision 719431. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3621/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3621/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3621/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3621/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          The current patch causes about 2x (st x=dfs.client.max.block.acquire.failures) the number retries for an unrecoverable block. It really retries at N*x, where N is the initial value of retries in DFSClint.DFSInputStream::read. To be fair, it looks like the current code exercises the same retry logic, but without resetting failures, the first exhaustion of sources causes it to bail out. It's not clear what the semantics are supposed to be in this case, but it's worth noting that this patch would change them.

          This seems to be also related to HADOOP-3185, but I'm not sure I really understand necessity of keeping track of failed block accesses in the DFS client.

          IIRC, the intent of HADOOP-3185 was to avoid filling the deadNodes list with good nodes hosting earler, bad blocks. It also improves on the quick fix in HADOOP-1911.

          I haven't been able to find a path where applying the patch reintroduces an infinite loop.

          Show
          Chris Douglas added a comment - The current patch causes about 2x (st x=dfs.client.max.block.acquire.failures) the number retries for an unrecoverable block. It really retries at N*x, where N is the initial value of retries in DFSClint.DFSInputStream::read. To be fair, it looks like the current code exercises the same retry logic, but without resetting failures , the first exhaustion of sources causes it to bail out. It's not clear what the semantics are supposed to be in this case, but it's worth noting that this patch would change them. This seems to be also related to HADOOP-3185 , but I'm not sure I really understand necessity of keeping track of failed block accesses in the DFS client. IIRC, the intent of HADOOP-3185 was to avoid filling the deadNodes list with good nodes hosting earler, bad blocks. It also improves on the quick fix in HADOOP-1911 . I haven't been able to find a path where applying the patch reintroduces an infinite loop.
          Hide
          dhruba borthakur added a comment -

          Hi Igor, Thanks for the patch.

          Chris, Do we still need this patch? Can you comment on whether this problem still exists in trunk? And if so, can Igor supply a unit test too?

          Show
          dhruba borthakur added a comment - Hi Igor, Thanks for the patch. Chris, Do we still need this patch? Can you comment on whether this problem still exists in trunk? And if so, can Igor supply a unit test too?
          Hide
          Chris Douglas added a comment -

          The patch did resolve the issue with long-lived streams introduced by HADOOP-1911 without undoing the fix; this almost certainly remains in trunk. I'd expect a unit test for this would be difficult to write...

          IIRC, it wasn't committed because the change to the retry semantics may not have been appropriate. Someone more familiar with DFSClient should make that call.

          Show
          Chris Douglas added a comment - The patch did resolve the issue with long-lived streams introduced by HADOOP-1911 without undoing the fix; this almost certainly remains in trunk. I'd expect a unit test for this would be difficult to write... IIRC, it wasn't committed because the change to the retry semantics may not have been appropriate. Someone more familiar with DFSClient should make that call.
          Hide
          stack added a comment -

          +1 on patch being applied to TRUNK as well as to 0.20, and 0.19 branches. DFSClient currently in its treatment of errors is as crude in effect as California's three-strikes rule; any three errors encountered on a file no matter on which of its N blocks and the stream is ruined (See HADOOP-5903 description for more detail though HADOOP-3185 nails the issue way back).

          Show
          stack added a comment - +1 on patch being applied to TRUNK as well as to 0.20, and 0.19 branches. DFSClient currently in its treatment of errors is as crude in effect as California's three-strikes rule; any three errors encountered on a file no matter on which of its N blocks and the stream is ruined (See HADOOP-5903 description for more detail though HADOOP-3185 nails the issue way back).
          Hide
          Raghu Angadi added a comment -

          This bug exists and I don't think the current patch is the right one (yet). We probably don't need this variable at all (see below).

          Looking at how 'failures' variable is used, it is pretty limited. My guess is that it was there right from the beginning and a lot of DFSClient has changed around it since then.

          I would say we need description of what it means : i.e. when it should be incremented and why there should be a limit. That will also answer when it should be reset.

          As I see it now : it is incremented only when connect to a datanode fails. That implies, it should be reset when such connect succeeds (in chooseDataNode()).

          But this is still not enough since it does not allow DFSClient to try all the replicas available (what if number of replicas is larger than 3?). May be we should try each datanode once (or twice)...That implies we probably don't need this variable at all. Just some local counter in 'chooseDataNode()' would do.

          Show
          Raghu Angadi added a comment - This bug exists and I don't think the current patch is the right one (yet). We probably don't need this variable at all (see below). Looking at how 'failures' variable is used, it is pretty limited. My guess is that it was there right from the beginning and a lot of DFSClient has changed around it since then. I would say we need description of what it means : i.e. when it should be incremented and why there should be a limit. That will also answer when it should be reset. As I see it now : it is incremented only when connect to a datanode fails. That implies, it should be reset when such connect succeeds (in chooseDataNode()). But this is still not enough since it does not allow DFSClient to try all the replicas available (what if number of replicas is larger than 3?). May be we should try each datanode once (or twice)...That implies we probably don't need this variable at all. Just some local counter in 'chooseDataNode()' would do.
          Hide
          Chris Douglas added a comment -

          May be we should try each datanode once (or twice)...That implies we probably don't need this variable at all. Just some local counter in 'chooseDataNode()' would do.

          Wouldn't this revert- and reintroduce- HADOOP-1911?

          I would say we need description of what it means : i.e. when it should be incremented and why there should be a limit. That will also answer when it should be reset.

          +1

          Show
          Chris Douglas added a comment - May be we should try each datanode once (or twice)...That implies we probably don't need this variable at all. Just some local counter in 'chooseDataNode()' would do. Wouldn't this revert- and reintroduce- HADOOP-1911 ? I would say we need description of what it means : i.e. when it should be incremented and why there should be a limit. That will also answer when it should be reset. +1
          Hide
          Raghu Angadi added a comment -

          > Wouldn't this revert- and reintroduce- HADOOP-1911?

          I see. I just looked at HADOOP-1911 and I don't think it fixed the real problem. The loop is because of combination of reset of dead nodes in chooseDataNode() and 'while (s != null)' loop in blockSeekTo(). Note that actual failure occurs while trying to create BlockReader().. not in chooseDataNode(). The problem is that chooseDataNode() can not decide if the datanode is ok or not.. just an address for DN is not enough. we also need connect(), 'success reply' and at least a few bytes read from the datanode. So HADOOP-1911 fixed the infinite loop, but not for right reasons.

          We could define successful datanode conenction as 'being able to read non zero bytes that we need'. A failure count keeps growing until there is a 'successful connection', and should be reset after that. (Some what similar approach to HADOOP-3831).

          I think this time we should have a explicitly stated policy of when a hard failure occurs (and may be when we refetch the block data etc).

          Show
          Raghu Angadi added a comment - > Wouldn't this revert- and reintroduce- HADOOP-1911 ? I see. I just looked at HADOOP-1911 and I don't think it fixed the real problem. The loop is because of combination of reset of dead nodes in chooseDataNode() and 'while (s != null)' loop in blockSeekTo(). Note that actual failure occurs while trying to create BlockReader().. not in chooseDataNode(). The problem is that chooseDataNode() can not decide if the datanode is ok or not.. just an address for DN is not enough. we also need connect(), 'success reply' and at least a few bytes read from the datanode. So HADOOP-1911 fixed the infinite loop, but not for right reasons. We could define successful datanode conenction as 'being able to read non zero bytes that we need'. A failure count keeps growing until there is a 'successful connection', and should be reset after that. (Some what similar approach to HADOOP-3831 ). I think this time we should have a explicitly stated policy of when a hard failure occurs (and may be when we refetch the block data etc).
          Hide
          stack added a comment -

          .bq We could define successful datanode conenction as 'being able to read non zero bytes that we need'. A failure count keeps growing until there is a 'successful connection', and should be reset after that. (Some what similar approach to HADOOP-3831).

          This seems reasonable to me.

          Show
          stack added a comment - .bq We could define successful datanode conenction as 'being able to read non zero bytes that we need'. A failure count keeps growing until there is a 'successful connection', and should be reset after that. (Some what similar approach to HADOOP-3831 ). This seems reasonable to me.
          Hide
          Chris Douglas added a comment -

          I just looked at HADOOP-1911 and I don't think it fixed the real problem.

          No question; it was literally a last-minute fix for 0.17.0, intended as a partial fix until HADOOP-3185 could be given priority. To be fair, the resolution explicitly renounces any claim to a fix for the real issue...

          We could define successful datanode conenction as 'being able to read non zero bytes that we need'. A failure count keeps growing until there is a 'successful connection', and should be reset after that. (Some what similar approach to HADOOP-3831).

          Isn't that what this patch effects (+/- the 2x dfs.client.max.block.acquire.failures semantics, which are a little odd)? I completely agree that the client code should be reworked to make the retry mechanism more legible, but- particularly if the idea is to push a fix into 0.19 and later- wouldn't it make sense to use this tweak or a variant of it? Clearly it's tricky code, so it seems reasonable to add something like HADOOP-3831 to trunk rather than back-porting it to earlier branches.

          Show
          Chris Douglas added a comment - I just looked at HADOOP-1911 and I don't think it fixed the real problem. No question; it was literally a last-minute fix for 0.17.0, intended as a partial fix until HADOOP-3185 could be given priority. To be fair, the resolution explicitly renounces any claim to a fix for the real issue... We could define successful datanode conenction as 'being able to read non zero bytes that we need'. A failure count keeps growing until there is a 'successful connection', and should be reset after that. (Some what similar approach to HADOOP-3831 ). Isn't that what this patch effects (+/- the 2x dfs.client.max.block.acquire.failures semantics, which are a little odd)? I completely agree that the client code should be reworked to make the retry mechanism more legible, but- particularly if the idea is to push a fix into 0.19 and later- wouldn't it make sense to use this tweak or a variant of it? Clearly it's tricky code, so it seems reasonable to add something like HADOOP-3831 to trunk rather than back-porting it to earlier branches.
          Hide
          Raghu Angadi added a comment -

          > Isn't that what this patch effects (+/- the 2x dfs.client.max.block.acquire.failures semantics, which are a little odd)?

          sort of, I think. It is an improvement, but only on the same lines as the current structure (no explicit policy, hard to see what happens and when). For e.g. it looks like 'a failure' is defined as not being able to connect/fetch from all the replicas. Also success is defined as just being able to get 'ok' from datanode. Which implies it might again go into infinite loop in some conditions (say all replicas are corrupt, thus only one is remaining, and it fails checksum at one particular position).

          The final fix could be as simiple as moving resetting 'failures' to readBuffer(). But I would prefer to add things like 'failures' to be incremented for every failure with a replica and limit the count to something like min(#replicas, large value like 10).

          Finally, I am mainly proposing an explicit policy in a code comment. 0.21 is fine.. big users like HBASE can have it backported.

          Show
          Raghu Angadi added a comment - > Isn't that what this patch effects (+/- the 2x dfs.client.max.block.acquire.failures semantics, which are a little odd)? sort of, I think. It is an improvement, but only on the same lines as the current structure (no explicit policy, hard to see what happens and when). For e.g. it looks like 'a failure' is defined as not being able to connect/fetch from all the replicas. Also success is defined as just being able to get 'ok' from datanode. Which implies it might again go into infinite loop in some conditions (say all replicas are corrupt, thus only one is remaining, and it fails checksum at one particular position). The final fix could be as simiple as moving resetting 'failures' to readBuffer(). But I would prefer to add things like 'failures' to be incremented for every failure with a replica and limit the count to something like min(#replicas, large value like 10). Finally, I am mainly proposing an explicit policy in a code comment. 0.21 is fine.. big users like HBASE can have it backported.
          Hide
          stack added a comment -

          .bq For e.g. it looks like 'a failure' is defined as not being able to connect/fetch from all the replicas. Also success is defined as just being able to get 'ok' from datanode.

          I'd say this a substantial improvement over whats currently in place. Will suggest that hbase users apply current patch at least for now.

          I agree that it'd be better if failure were 'smarter' counting the likes of bad checksums in replica, etc., and that the 'failure' policy be explicitly stated.

          Show
          stack added a comment - .bq For e.g. it looks like 'a failure' is defined as not being able to connect/fetch from all the replicas. Also success is defined as just being able to get 'ok' from datanode. I'd say this a substantial improvement over whats currently in place. Will suggest that hbase users apply current patch at least for now. I agree that it'd be better if failure were 'smarter' counting the likes of bad checksums in replica, etc., and that the 'failure' policy be explicitly stated.
          Hide
          Jonathan Gray added a comment -

          I just tried this patch after getting a lot of bad blocks reported under heavy load from HBase.

          After applying this, I can now get through all my load tests without a problem. The datanode is heavily loaded and HBase takes a while to perform compactions (~1min worst case) but it manages to get through it whereas without the patch it crapped out and I wasn't able to recover easily.

          I'm running an otherwise clean hadoop 0.20.0

          Show
          Jonathan Gray added a comment - I just tried this patch after getting a lot of bad blocks reported under heavy load from HBase. After applying this, I can now get through all my load tests without a problem. The datanode is heavily loaded and HBase takes a while to perform compactions (~1min worst case) but it manages to get through it whereas without the patch it crapped out and I wasn't able to recover easily. I'm running an otherwise clean hadoop 0.20.0
          Hide
          ryan rawson added a comment -

          +1 this patch is a must have for anyone running hbase.

          The lack of it in hadoop trunk is forcing us to ship a non-standard hadoop jar just for a 2 line fix.

          Please commit already!

          Show
          ryan rawson added a comment - +1 this patch is a must have for anyone running hbase. The lack of it in hadoop trunk is forcing us to ship a non-standard hadoop jar just for a 2 line fix. Please commit already!
          Hide
          Raghu Angadi added a comment -

          Raghu> Finally, I am mainly proposing an explicit policy in a code comment. [...]
          Stack> I agree that [...] etc., and that the 'failure' policy be explicitly stated.

          Is something blocking from adding a description of what 'failures' is meant for?
          Otherwise, IMHO it is just a magic fix that could lead to similar problems in future.. b/c it is hard for developers to keep and review contracts that they don't know.. that is the reason why this bug first appeared.

          Not a -1 from me.

          Show
          Raghu Angadi added a comment - Raghu> Finally, I am mainly proposing an explicit policy in a code comment. [...] Stack> I agree that [...] etc., and that the 'failure' policy be explicitly stated. Is something blocking from adding a description of what 'failures' is meant for? Otherwise, IMHO it is just a magic fix that could lead to similar problems in future.. b/c it is hard for developers to keep and review contracts that they don't know.. that is the reason why this bug first appeared. Not a -1 from me.
          Hide
          Todd Lipcon added a comment -

          Linking HDFS-656. This JIRA is for discussion about what the failure/retry semantics are now and what they really should be.

          Show
          Todd Lipcon added a comment - Linking HDFS-656 . This JIRA is for discussion about what the failure/retry semantics are now and what they really should be.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          lgor's patch looks good to me but it got stale.

          Show
          Tsz Wo Nicholas Sze added a comment - lgor's patch looks good to me but it got stale.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h127_20091016.patch: based on lgor's patch,

          • removed DFSInputStream.failures and added a local variable in DFSInputStream.chooseDataNode(..) since "failures" is local property;
          • changed DFSClient.bestNode(..) to static and removed "throws IOException"; and
          • simplified DFSClient.DNAddrPair.
          Show
          Tsz Wo Nicholas Sze added a comment - h127_20091016.patch: based on lgor's patch, removed DFSInputStream.failures and added a local variable in DFSInputStream.chooseDataNode(..) since "failures" is local property; changed DFSClient.bestNode(..) to static and removed "throws IOException"; and simplified DFSClient.DNAddrPair.
          Hide
          ryan rawson added a comment -

          here is a fixed version for hdfs-branch-0.21.

          Show
          ryan rawson added a comment - here is a fixed version for hdfs-branch-0.21.
          Hide
          stack added a comment -

          Ryan, your attachment doesn't look like a patch. Its a git-made patch. Does hudson know how to apply theses?

          Nicholas, your patch looks more interesting moving the failure counter local. The other changes seem fine. Any reason for making the bestNode static? You've changed what bestNode returns. It now returns null rather than throw an IOE with "no live nodes contain current block". Us over in hbase are not going to miss that message.

          I'm testing your patch now...

          Show
          stack added a comment - Ryan, your attachment doesn't look like a patch. Its a git-made patch. Does hudson know how to apply theses? Nicholas, your patch looks more interesting moving the failure counter local. The other changes seem fine. Any reason for making the bestNode static? You've changed what bestNode returns. It now returns null rather than throw an IOE with "no live nodes contain current block". Us over in hbase are not going to miss that message. I'm testing your patch now...
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > ... Any reason for making the bestNode static? You've changed what bestNode returns. It now returns null rather than throw an IOE with "no live nodes contain current block". Us over in hbase are not going to miss that message.

          It is because bestNode(..) is a simple utility method. It does not use any DFSClient fields/methods.

          In the original codes, bestNode(..) is invoked only in chooseDataNode(..), which basically use a try-catch to deal with the IOException thrown by bestNode(..).

          //original codes in chooseDataNode(..)
                  try {
                    DatanodeInfo chosenNode = bestNode(nodes, deadNodes);
                    ...
                  } catch (IOException ie) {
                    ...
                    LOG.info("Could not obtain block " + block.getBlock()
                        + " from any node: " + ie
                        + ". Will get new block locations from namenode and retry...");
                    ...
                  }
          

          As shown above, the IOException ie is used only for a log message. I think "No live nodes contain current block" is a kind of redundant in the log message. Do you think that "Could not obtain block " + block.getBlock() + " from any node" is clear enough? No? We may change the log message if necessary.

          Show
          Tsz Wo Nicholas Sze added a comment - > ... Any reason for making the bestNode static? You've changed what bestNode returns. It now returns null rather than throw an IOE with "no live nodes contain current block". Us over in hbase are not going to miss that message. It is because bestNode(..) is a simple utility method. It does not use any DFSClient fields/methods. In the original codes, bestNode(..) is invoked only in chooseDataNode(..), which basically use a try-catch to deal with the IOException thrown by bestNode(..). //original codes in chooseDataNode(..) try { DatanodeInfo chosenNode = bestNode(nodes, deadNodes); ... } catch (IOException ie) { ... LOG.info( "Could not obtain block " + block.getBlock() + " from any node: " + ie + ". Will get new block locations from namenode and retry..." ); ... } As shown above, the IOException ie is used only for a log message. I think "No live nodes contain current block" is a kind of redundant in the log message. Do you think that "Could not obtain block " + block.getBlock() + " from any node" is clear enough? No? We may change the log message if necessary.
          Hide
          stack added a comment -

          Thanks for explanation. I think message is fine. Let me do some local testing. I'll be back.

          Show
          stack added a comment - Thanks for explanation. I think message is fine. Let me do some local testing. I'll be back.
          Hide
          dhruba borthakur added a comment -

          This patch definitely improves the condition on trunk to prevent a long-running-dfs-reader from baling out prematurely. Is there a requirement to patch this on hadoop 0.20 as well?

          Show
          dhruba borthakur added a comment - This patch definitely improves the condition on trunk to prevent a long-running-dfs-reader from baling out prematurely. Is there a requirement to patch this on hadoop 0.20 as well?
          Hide
          Todd Lipcon added a comment -

          +1 to committing a fix on branch-20. We've included the original patch in our distro for some customers and seen no issues running at reasonable scale - it's now part of our standard distro and no complaints there either.

          Show
          Todd Lipcon added a comment - +1 to committing a fix on branch-20. We've included the original patch in our distro for some customers and seen no issues running at reasonable scale - it's now part of our standard distro and no complaints there either.
          Hide
          stack added a comment -

          Just to say that HBase ships with a patched hadoop that includes the old 4681.patch since our 0.20.0 release. I only know of how it improved things. I've not heard of detrimental effects.

          Show
          stack added a comment - Just to say that HBase ships with a patched hadoop that includes the old 4681.patch since our 0.20.0 release. I only know of how it improved things. I've not heard of detrimental effects.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Will send an email to the mailing list for committing this to 0.20.

          Show
          Tsz Wo Nicholas Sze added a comment - Will send an email to the mailing list for committing this to 0.20.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Since this may be back ported to 0.20, I reverted the code re-factoring in the previous patch to minimize the chance of having silly mistakes.

          h127_20091019.patch:

          • declared failures as a local variable; and
          • added final to maxBlockAcquireFailures.
          Show
          Tsz Wo Nicholas Sze added a comment - Since this may be back ported to 0.20, I reverted the code re-factoring in the previous patch to minimize the chance of having silly mistakes. h127_20091019.patch: declared failures as a local variable; and added final to maxBlockAcquireFailures.
          Hide
          stack added a comment -

          +1 on this patch.

          Some notes:

          + This patch does not have the issue that hadoop-1911 "fixed" though it moves the failure counter back to being a local variable. There is no danger that it "...revert[s]- and reintroduce[s]- HADOOP-1911?" because the patch removes the catching of exceptions inside chooseDataNode while loop.
          + I am of the opinion that this patch should not be gated on the necessary work HDFS-656 "Clarify error handling and retry semantics for DFS read path" (nor HDFS-378 "DFSClient should track failures by block rather than globally"). This patch should actually help some as it narrows scope of the failures variable. But maybe we can do even more as part of this patch (See below).

          If a new patch is cut, the following might be considered:

          + There is no documentation of "dfs.client.max.block.acquire.failures" nor of what local counter 'failures' means. Above its suggested that we at least doc. what a failure is ("For e.g. it looks like 'a failure' is defined as not being able to connect/fetch from all the replicas. Also success is defined as just being able to get 'ok' from datanode.")
          + By default, "dfs.client.max.block.acquire.failures" is 3. If more than 3 replicas, we could fail though good replicas out on cluster. Should DFSClient set this.maxBlockAcquireFailures default to max of dfs.replication/dfs.client.max.block.acquire.failures?
          + I'd put these two LOG.info messages together rather than have two LOG.info lines:

                    if (nodes == null || nodes.length == 0) {
                      LOG.info("No node available for block: " + blockInfo);
                    }
                    LOG.info("Could not obtain block " + block.getBlock()
                        + " from any node. Will get new block locations from namenode and retry...");
          
          Show
          stack added a comment - +1 on this patch. Some notes: + This patch does not have the issue that hadoop-1911 "fixed" though it moves the failure counter back to being a local variable. There is no danger that it "...revert [s] - and reintroduce [s] - HADOOP-1911 ?" because the patch removes the catching of exceptions inside chooseDataNode while loop. + I am of the opinion that this patch should not be gated on the necessary work HDFS-656 "Clarify error handling and retry semantics for DFS read path" (nor HDFS-378 "DFSClient should track failures by block rather than globally"). This patch should actually help some as it narrows scope of the failures variable. But maybe we can do even more as part of this patch (See below). If a new patch is cut, the following might be considered: + There is no documentation of "dfs.client.max.block.acquire.failures" nor of what local counter 'failures' means. Above its suggested that we at least doc. what a failure is ("For e.g. it looks like 'a failure' is defined as not being able to connect/fetch from all the replicas. Also success is defined as just being able to get 'ok' from datanode.") + By default, "dfs.client.max.block.acquire.failures" is 3. If more than 3 replicas, we could fail though good replicas out on cluster. Should DFSClient set this.maxBlockAcquireFailures default to max of dfs.replication/dfs.client.max.block.acquire.failures? + I'd put these two LOG.info messages together rather than have two LOG.info lines: if (nodes == null || nodes.length == 0) { LOG.info( "No node available for block: " + blockInfo); } LOG.info( "Could not obtain block " + block.getBlock() + " from any node. Will get new block locations from namenode and retry..." );
          Hide
          Todd Lipcon added a comment -

          By default, "dfs.client.max.block.acquire.failures" is 3. If more than 3 replicas, we could fail though good replicas out on cluster. Should DFSClient set this.maxBlockAcquireFailures default to max of dfs.replication/dfs.client.max.block.acquire.failures?

          Another example of bad naming - this variable actually refers to the number of times the read path will go back to the namenode for a new set of locations. It's to deal with the case when some locations have been cached for some amount of time, during which the blocks have been moved to new locations.

          Show
          Todd Lipcon added a comment - By default, "dfs.client.max.block.acquire.failures" is 3. If more than 3 replicas, we could fail though good replicas out on cluster. Should DFSClient set this.maxBlockAcquireFailures default to max of dfs.replication/dfs.client.max.block.acquire.failures? Another example of bad naming - this variable actually refers to the number of times the read path will go back to the namenode for a new set of locations. It's to deal with the case when some locations have been cached for some amount of time, during which the blocks have been moved to new locations.
          Hide
          stack added a comment -

          To be clear, above +1 and comments were for h127_20091016.patch.

          Could h127_20091019.patch have the issue hadoop-1911 "fixed"?

          Show
          stack added a comment - To be clear, above +1 and comments were for h127_20091016.patch. Could h127_20091019.patch have the issue hadoop-1911 "fixed"?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h127_20091019.patch failed on TestBlockMissingException. It may also have problem on HADOOP-1911. Need more works.

          Show
          Tsz Wo Nicholas Sze added a comment - h127_20091019.patch failed on TestBlockMissingException. It may also have problem on HADOOP-1911 . Need more works.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h127_20091019b.patch: synced Igor's patch with trunk.

          Since this is likely to be committed to 0.20, it is better to use Igor's patch which was already tested extensively.

          Stack, your comments totally make sense to me but I don't want to introduce big change on 0.20. Let's do the improvement on separated issues like HDFS-656, HDFS-378, etc.

          Show
          Tsz Wo Nicholas Sze added a comment - h127_20091019b.patch: synced Igor's patch with trunk. Since this is likely to be committed to 0.20, it is better to use Igor's patch which was already tested extensively. Stack, your comments totally make sense to me but I don't want to introduce big change on 0.20. Let's do the improvement on separated issues like HDFS-656 , HDFS-378 , etc.
          Hide
          Todd Lipcon added a comment -

          Stack, your comments totally make sense to me but I don't want to introduce big change on 0.20. Let's do the improvement on separated issues like HDFS-656, HDFS-378, etc.

          +1 - agree we should keep the branch-20 change minimal.

          Show
          Todd Lipcon added a comment - Stack, your comments totally make sense to me but I don't want to introduce big change on 0.20. Let's do the improvement on separated issues like HDFS-656 , HDFS-378 , etc. +1 - agree we should keep the branch-20 change minimal.
          Hide
          stack added a comment -

          +1 on Igor's "4681.patch" to 0.20 branch (Good stuff).

          Show
          stack added a comment - +1 on Igor's "4681.patch" to 0.20 branch (Good stuff).
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12422638/h127_20091019b.patch
          against trunk revision 826905.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/41/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/41/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/41/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/41/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422638/h127_20091019b.patch against trunk revision 826905. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/41/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/41/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/41/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/41/console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The failure of TestFileAppend2 is not related to this.

          No new tests added since the there is an existing test, TestBlockMissingException, for this and the changes are very simple.

          I will wait for a few days for the vote. See whether this could be committed to 0.20.

          Show
          Tsz Wo Nicholas Sze added a comment - The failure of TestFileAppend2 is not related to this. No new tests added since the there is an existing test, TestBlockMissingException, for this and the changes are very simple. I will wait for a few days for the vote . See whether this could be committed to 0.20.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have committed this to 0.20 and above. Thanks, Igor!

          (The assignee of this issue should be Igor. I merely synced the patch with trunk.)

          Show
          Tsz Wo Nicholas Sze added a comment - I have committed this to 0.20 and above. Thanks, Igor! (The assignee of this issue should be Igor. I merely synced the patch with trunk.)
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #79 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/79/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #79 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/79/ )
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #54 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/54/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #54 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/54/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #120 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/120/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #120 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/120/ )
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #78 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/78/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #78 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/78/ )
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Unfortunately, the 0.20 patch causes TestFsck failing; see Suresh's comment.

          Show
          Tsz Wo Nicholas Sze added a comment - Unfortunately, the 0.20 patch causes TestFsck failing; see Suresh's comment .
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Checked 0.21 and 0.22, the problem does not exist. I am going to revert the patch from 0.20.

          Show
          Tsz Wo Nicholas Sze added a comment - Checked 0.21 and 0.22, the problem does not exist. I am going to revert the patch from 0.20.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have reverted the patch committed to 0.20.

          Show
          Tsz Wo Nicholas Sze added a comment - I have reverted the patch committed to 0.20.
          Hide
          Zlatin Balevsky added a comment -

          @Tsz Wo: does the issue happen if the patch is applied against 0.20.1? HBase ships with this patch; is it safer to not have it?

          Show
          Zlatin Balevsky added a comment - @Tsz Wo: does the issue happen if the patch is applied against 0.20.1? HBase ships with this patch; is it safer to not have it?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          @Zlatin: yes, I just have checked 0.20.1 release with the patch (4681.patch). TestFsck failed.

          As mentioned by Suresh, the patch causes an infinite loop on DFSClient when reading a block with all the replicas corrupted.

          I don't know much about HBase. So I cannot answer your HBase question. Please check with the HBase mailing lists.

          Show
          Tsz Wo Nicholas Sze added a comment - @Zlatin: yes, I just have checked 0.20.1 release with the patch (4681.patch). TestFsck failed. As mentioned by Suresh, the patch causes an infinite loop on DFSClient when reading a block with all the replicas corrupted. I don't know much about HBase. So I cannot answer your HBase question. Please check with the HBase mailing lists.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #169 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/169/)
          Move from 0.20 to 0.21 in CHANGES.txt.

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #169 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/169/ ) Move from 0.20 to 0.21 in CHANGES.txt.
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #96 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/96/)
          Move from 0.20 to 0.21 in CHANGES.txt.

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #96 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/96/ ) Move from 0.20 to 0.21 in CHANGES.txt.
          Hide
          Todd Lipcon added a comment -

          For those interested in this problem, I made some notes towards a proper fix in HDFS-378

          Show
          Todd Lipcon added a comment - For those interested in this problem, I made some notes towards a proper fix in HDFS-378
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #202 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/202/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #202 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/202/ )
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #196 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/196/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #196 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/196/ )
          Hide
          Todd Lipcon added a comment -

          In case anyone finds this useful, here's an additional test for TestCrcCorruption that times out on branch-20 when HDFS-127 has not been reverted. (I'm using this test to work on a new version of HDFS-127 for branch-20 that doesn't cause infinite loops, for inclusion in our distro and to help out the HBase guys)

          Show
          Todd Lipcon added a comment - In case anyone finds this useful, here's an additional test for TestCrcCorruption that times out on branch-20 when HDFS-127 has not been reverted. (I'm using this test to work on a new version of HDFS-127 for branch-20 that doesn't cause infinite loops, for inclusion in our distro and to help out the HBase guys)
          Hide
          Todd Lipcon added a comment -

          Here's a redone patch against branch-20. I've taken the approach of resetting the failure counter at the start of the user-facing read and pread calls in DFSInputStream. The logic here is that the failure counter should limit the number of internal retries before throwing an exception back to the client. As long as the client is making some progress, we don't care about the total number of failures over the course of the stream, and it should be reset for each operation.

          I've also included two new unit tests. The first, in TestCrcCorruption, guards against the error we saw with the original patch. It reliably reproduces the infinite loop with the broken patch that was originally on branch-20. The second new unit test, in TestDFSClientRetries, verifies the new behavior, namely that a given DFSInputStream can continue to be used even when the total number of failures exceeds maxBlockAcquires, so long as the number of retries on any given read() operation does not.

          To accomplish the second test, I pulled in the mockito dependency via ivy. The ability to inject bad block locations into the client made the test a lot more straightforward, and I don't see any downsides to pulling it into branch-20.

          Show
          Todd Lipcon added a comment - Here's a redone patch against branch-20. I've taken the approach of resetting the failure counter at the start of the user-facing read and pread calls in DFSInputStream. The logic here is that the failure counter should limit the number of internal retries before throwing an exception back to the client. As long as the client is making some progress, we don't care about the total number of failures over the course of the stream, and it should be reset for each operation. I've also included two new unit tests. The first, in TestCrcCorruption, guards against the error we saw with the original patch. It reliably reproduces the infinite loop with the broken patch that was originally on branch-20. The second new unit test, in TestDFSClientRetries, verifies the new behavior, namely that a given DFSInputStream can continue to be used even when the total number of failures exceeds maxBlockAcquires, so long as the number of retries on any given read() operation does not. To accomplish the second test, I pulled in the mockito dependency via ivy. The ability to inject bad block locations into the client made the test a lot more straightforward, and I don't see any downsides to pulling it into branch-20.
          Hide
          Todd Lipcon added a comment -

          Reopening this issue with a new candidate patch for branch-20, which I believe addresses the issues we saw in the first version (see rationale above), and tests it much more thoroughly.

          Show
          Todd Lipcon added a comment - Reopening this issue with a new candidate patch for branch-20, which I believe addresses the issues we saw in the first version (see rationale above), and tests it much more thoroughly.
          Hide
          Todd Lipcon added a comment -

          I forgot to update the eclipse classpath with the previous patch. This one's the same, except with that fixed. Everything else on test-patch was +1.

          I ran all the unit tests and everything that usually passes on branch-20 passed

          Show
          Todd Lipcon added a comment - I forgot to update the eclipse classpath with the previous patch. This one's the same, except with that fixed. Everything else on test-patch was +1. I ran all the unit tests and everything that usually passes on branch-20 passed
          Hide
          Karthik K added a comment -

          When do we expect the next release on hadoop-0.20 ?

          Show
          Karthik K added a comment - When do we expect the next release on hadoop-0.20 ?
          Hide
          Owen O'Malley added a comment -

          Nicholas, since you did the revert can you check to see if this new patch is ok for 0.20.2?

          Show
          Owen O'Malley added a comment - Nicholas, since you did the revert can you check to see if this new patch is ok for 0.20.2?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Sure, I will check it.

          Show
          Tsz Wo Nicholas Sze added a comment - Sure, I will check it.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I ran unit tests on the patch. The following tests failed

              [junit] Running org.apache.hadoop.io.TestUTF8
              [junit] Tests Run: 3, Failures: 1, Errors: 0, Time elapsed: 0.161 sec
          
              [junit] Running org.apache.hadoop.hdfsproxy.TestHdfsProxy
              [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 4.219 sec
          

          BTW, since the new patch is quite different from the patch already committed, I suggest to fix the problem in a new issue so that the new patch could be committed to 0.20, 0.21 and trunk. Otherwise, I am not sure how to commit the new patch.

          Show
          Tsz Wo Nicholas Sze added a comment - I ran unit tests on the patch. The following tests failed [junit] Running org.apache.hadoop.io.TestUTF8 [junit] Tests Run: 3, Failures: 1, Errors: 0, Time elapsed: 0.161 sec [junit] Running org.apache.hadoop.hdfsproxy.TestHdfsProxy [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 4.219 sec BTW, since the new patch is quite different from the patch already committed, I suggest to fix the problem in a new issue so that the new patch could be committed to 0.20, 0.21 and trunk. Otherwise, I am not sure how to commit the new patch.
          Hide
          Todd Lipcon added a comment -

          Hi Nicholas,

          TestUTF8 is known to be flaky. See HADOOP-6522

          I also see TestHdfsProxy fail in strange ways on a pretty regular basis. I doubt it's related since it doesn't stress any failure scenarios.

          I actually did open a new patch for this issue on trunk - HDFS-927 which is linked here. Sorry for the confusion.

          Show
          Todd Lipcon added a comment - Hi Nicholas, TestUTF8 is known to be flaky. See HADOOP-6522 I also see TestHdfsProxy fail in strange ways on a pretty regular basis. I doubt it's related since it doesn't stress any failure scenarios. I actually did open a new patch for this issue on trunk - HDFS-927 which is linked here. Sorry for the confusion.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > I actually did open a new patch for this issue on trunk - HDFS-927 which is linked here. Sorry for the confusion.

          Great! Let's close this and work on HDFS-927.

          Show
          Tsz Wo Nicholas Sze added a comment - > I actually did open a new patch for this issue on trunk - HDFS-927 which is linked here. Sorry for the confusion. Great! Let's close this and work on HDFS-927 .
          Hide
          Nicolas Spiegelberg added a comment -

          This should be pulled into the branch-0.20-append branch.

          Show
          Nicolas Spiegelberg added a comment - This should be pulled into the branch-0.20-append branch.

            People

            • Assignee:
              Igor Bolotin
              Reporter:
              Igor Bolotin
            • Votes:
              3 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development