Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-872

DFSClient 0.20.1 is incompatible with HDFS 0.20.2

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.1, 0.20.2
    • Fix Version/s: 0.20.2
    • Component/s: datanode, hdfs-client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      After upgrading to that latest HDFS 0.20.2 (r896310 from /branches/branch-0.20), old DFS clients (0.20.1) seem to not work anymore. HBase uses the 0.20.1 hadoop core jars and the HBase master will no longer startup. Here is the exception from the HBase master log:

      2010-01-06 09:59:46,762 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_338051
      2596555557728_1002 file=/hbase/hbase.version
              at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1788)
              at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1616)
              at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1743)
              at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1673)
              at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:320)
              at java.io.DataInputStream.readUTF(DataInputStream.java:572)
              at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:189)
              at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:208)
              at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:208)
              at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
              at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
              at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
              at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
              at org.apache.hadoop.hbase.master.HMaster.doMain(HMaster.java:1241)
              at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1282)
      
      2010-01-06 09:59:46,763 FATAL org.apache.hadoop.hbase.master.HMaster: Not starting HMaster because:
      java.io.IOException: Could not obtain block: blk_3380512596555557728_1002 file=/hbase/hbase.version
              at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1788)
              at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1616)
              at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1743)
              at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1673)
              at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:320)
              at java.io.DataInputStream.readUTF(DataInputStream.java:572)
              at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:189)
              at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:208)
              at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:208)
              at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
              at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
              at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
              at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
              at org.apache.hadoop.hbase.master.HMaster.doMain(HMaster.java:1241)
              at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1282)
      

      If I switch the hadoop jars in the hbase/lib directory with 0.20.2 version it works well, which what led me to open this bug here and not in the HBASE project.

      1. hdfs-793-branch20.txt
        15 kB
        Todd Lipcon
      2. hdfs-793-branch20.txt
        15 kB
        Todd Lipcon
      3. hdfs-872.txt
        8 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          This is already being discussed on the -general list. Should we continue the discussion here or there?

          Show
          Todd Lipcon added a comment - This is already being discussed on the -general list. Should we continue the discussion here or there?
          Hide
          Todd Lipcon added a comment -

          I have a patch that fixes HDFS-101 and is also 0.20.1 compatible. I'm still in the midst of running it through all the unit tests, but manual tests for pipeline recovery look good. Will upload here when it's been vetted more thoroughly.

          Show
          Todd Lipcon added a comment - I have a patch that fixes HDFS-101 and is also 0.20.1 compatible. I'm still in the midst of running it through all the unit tests, but manual tests for pipeline recovery look good. Will upload here when it's been vetted more thoroughly.
          Hide
          Todd Lipcon added a comment -

          Attached is a patch against current branch-0.20 which resolves the protocol incompatibility of the HDFS-101/HDFS-793 pair. Since this is tricky code, I'll try to summarize the patch in detail:

          In HDFS-793, PipelineAck's wire format includes a new element which is the number of status replies to follow. This is the central incompatibility. So, in this patch, I removed that field and reset the version number back to the original 14 from old branch-0.20. To know how many status replies to read, it now takes the downstream pipeline depth as a constructor parameter. This is used only for reading, and otherwise is -1 (it's an error to call readFields if it's not been set)

          Since the number of replies in a pipeline ack is no longer dynamic, I removed the getNumOfReplies call as well.

          When reading an ack, I check for the HEARTBEAT message, and in that case don't read any replies. Otherwise I expect a reply from each downstream datanode. For review: should readFields handle the case of a sequence number equal to -2? Best I can tell, the current code never sends such a sequence number, and if it does it is an error. It may make sense to check for it and throw an IOException in the case of a negative seqno that is not HEARTBEAT_SEQNO. Opinions appreciated.

          In DFSClient I added a DEBUG level printout for the contents of the pipeline. This was useful to me as I was testing, to ensure that I tested killing each of the nodes in the pipeline in the intended order.

          In BlockReciever, I added back the "continue" during HEARTBEAT processing. I believe this was an omission in the earlier patch - best I can tell, without the continue, it currently sends a spurious "seqno=-2" ack after each heartbeat. With the continue call, it circles around the loop correctly to wait for the next ack.

          For review: I put a TODO for the case where BlockReceiver receives a seqno = -2. I currently believe that any negative sequence number that is not HEART_BEAT is an error and should throw an IOException (eg we got our reads misaligned).

          When constructing the ack message for a failed mirror, since every ack must have the same number of replies, I send SUCCESS followed by N errors, where N is the number of downstream targets. The client's behavior is to eject the first ERROR node, so the presence of ERROR status further downstream is unimportant - in truth they are semantically UNKNOWN, but no such status code exists. For review: HDFS-793 reversed the order of the loop on DFSClient.java:2431 to locate the last DN with ERROR status. I had to reverse this back to the original loop order for this patch since the replies look like SUCCESS, ERROR, ERROR in the case that DN 2 dies.

          In terms of testing, I performed the following:

          • Start up 3 node distributed cluster using patched servers
          • With an unpatched client, began uploading a file. Killed each node in the pipeline (first, second, last) and ensured that the correct datanode was ejected.
          • With a patched client and patched server, ran the same test.
          • With patched client and unpatched server, ensured that file uploads work properly. I did not test killing the unpatched server nodes here - I can do so if necessary, but was using a shared cluster for this test.

          In all cases, the file upload lasted more than 30 seconds, so heartbeats were tested.

          Show
          Todd Lipcon added a comment - Attached is a patch against current branch-0.20 which resolves the protocol incompatibility of the HDFS-101 / HDFS-793 pair. Since this is tricky code, I'll try to summarize the patch in detail: In HDFS-793 , PipelineAck's wire format includes a new element which is the number of status replies to follow. This is the central incompatibility. So, in this patch, I removed that field and reset the version number back to the original 14 from old branch-0.20. To know how many status replies to read, it now takes the downstream pipeline depth as a constructor parameter. This is used only for reading, and otherwise is -1 (it's an error to call readFields if it's not been set) Since the number of replies in a pipeline ack is no longer dynamic, I removed the getNumOfReplies call as well. When reading an ack, I check for the HEARTBEAT message, and in that case don't read any replies. Otherwise I expect a reply from each downstream datanode. For review: should readFields handle the case of a sequence number equal to -2? Best I can tell, the current code never sends such a sequence number, and if it does it is an error. It may make sense to check for it and throw an IOException in the case of a negative seqno that is not HEARTBEAT_SEQNO. Opinions appreciated. In DFSClient I added a DEBUG level printout for the contents of the pipeline. This was useful to me as I was testing, to ensure that I tested killing each of the nodes in the pipeline in the intended order. In BlockReciever, I added back the "continue" during HEARTBEAT processing. I believe this was an omission in the earlier patch - best I can tell, without the continue, it currently sends a spurious "seqno=-2" ack after each heartbeat. With the continue call, it circles around the loop correctly to wait for the next ack. For review : I put a TODO for the case where BlockReceiver receives a seqno = -2. I currently believe that any negative sequence number that is not HEART_BEAT is an error and should throw an IOException (eg we got our reads misaligned). When constructing the ack message for a failed mirror, since every ack must have the same number of replies, I send SUCCESS followed by N errors, where N is the number of downstream targets. The client's behavior is to eject the first ERROR node, so the presence of ERROR status further downstream is unimportant - in truth they are semantically UNKNOWN, but no such status code exists. For review : HDFS-793 reversed the order of the loop on DFSClient.java:2431 to locate the last DN with ERROR status. I had to reverse this back to the original loop order for this patch since the replies look like SUCCESS, ERROR, ERROR in the case that DN 2 dies. In terms of testing, I performed the following: Start up 3 node distributed cluster using patched servers With an unpatched client, began uploading a file. Killed each node in the pipeline (first, second, last) and ensured that the correct datanode was ejected. With a patched client and patched server, ran the same test. With patched client and unpatched server, ensured that file uploads work properly. I did not test killing the unpatched server nodes here - I can do so if necessary, but was using a shared cluster for this test. In all cases, the file upload lasted more than 30 seconds, so heartbeats were tested.
          Hide
          Jean-Daniel Cryans added a comment -

          I didn't review the patch (not enough knowledge of the code), but I did try it on a 5 nodes cluster with HBase during a 10GB upload. I killed one DN and the DFSClient detected the right one in the pipeline.

          In this regard, +1

          Show
          Jean-Daniel Cryans added a comment - I didn't review the patch (not enough knowledge of the code), but I did try it on a 5 nodes cluster with HBase during a 10GB upload. I killed one DN and the DFSClient detected the right one in the pipeline. In this regard, +1
          Hide
          Hairong Kuang added a comment -

          Its quite tricky to review if this patch solves the compatibility problem or not when dealing with seq# -2 and -1. How about I reverting the change that HDFS-793 and HDFS-101 made to 0.20 first. Then lets come up with a compatible patch for HDFS-793.

          Show
          Hairong Kuang added a comment - Its quite tricky to review if this patch solves the compatibility problem or not when dealing with seq# -2 and -1. How about I reverting the change that HDFS-793 and HDFS-101 made to 0.20 first. Then lets come up with a compatible patch for HDFS-793 .
          Hide
          Todd Lipcon added a comment -

          Sure - I can also generate such a squashed 793-101-872 patch here for review, though it won't apply. I thought it would be easier to review since this doesn't change the semantics, just protocol. Let me know if you revert, and I'll regenerate

          Show
          Todd Lipcon added a comment - Sure - I can also generate such a squashed 793-101-872 patch here for review, though it won't apply. I thought it would be easier to review since this doesn't change the semantics, just protocol. Let me know if you revert, and I'll regenerate
          Hide
          Hairong Kuang added a comment -

          I pulled HDFS-793 and HDFS-101 off branch 0.20 and move their change logs into 0.21 section.

          Show
          Hairong Kuang added a comment - I pulled HDFS-793 and HDFS-101 off branch 0.20 and move their change logs into 0.21 section.
          Hide
          Todd Lipcon added a comment -

          Hey Hairong,

          What would be easiest to review? A single patch that incorporates both HDFS-793 and HDFS-101? Or first a compatible patch of HDFS-793, and then after that's reviewed a backport of 101 on top of that?

          Show
          Todd Lipcon added a comment - Hey Hairong, What would be easiest to review? A single patch that incorporates both HDFS-793 and HDFS-101 ? Or first a compatible patch of HDFS-793 , and then after that's reviewed a backport of 101 on top of that?
          Hide
          Hairong Kuang added a comment -

          I prefer the later.

          Show
          Hairong Kuang added a comment - I prefer the later.
          Hide
          Todd Lipcon added a comment -

          Here's a version of HDFS-793 that I think should be compatible.

          Though, one question: as I understand the old code, the order of the replies in the ack packet was actually in reverse-depth order. That is to say, if the pipeline is Client -> DN1 -> DN2 -> DN3, the replies come back in the order DN3, DN2, DN1. The client code, however, was acting as if they came back in DN1, DN2, DN3. This patch does the latter order, since that's what clients expect, and I think is what makes more sense. This is a bit "incompatible", but since the recovery never worked right anyhow, I don't think it matters.

          I also took the liberty of adding some comments that explain my understanding of the seqno=-2 stuff. Let me know if I've got it wrong.

          Show
          Todd Lipcon added a comment - Here's a version of HDFS-793 that I think should be compatible. Though, one question: as I understand the old code, the order of the replies in the ack packet was actually in reverse-depth order. That is to say, if the pipeline is Client -> DN1 -> DN2 -> DN3, the replies come back in the order DN3, DN2, DN1. The client code, however, was acting as if they came back in DN1, DN2, DN3. This patch does the latter order, since that's what clients expect, and I think is what makes more sense. This is a bit "incompatible", but since the recovery never worked right anyhow, I don't think it matters. I also took the liberty of adding some comments that explain my understanding of the seqno=-2 stuff. Let me know if I've got it wrong.
          Hide
          Todd Lipcon added a comment -

          Tested compatibility by using the new client against an old cluster, and the old client against a new cluster. Did not test failure of old-cluster nodes, since I was using a shared one.

          Show
          Todd Lipcon added a comment - Tested compatibility by using the new client against an old cluster, and the old client against a new cluster. Did not test failure of old-cluster nodes, since I was using a shared one.
          Hide
          Hairong Kuang added a comment -

          > as I understand the old code, the order of the replies in the ack packet was actually in reverse-depth order.
          Not sure which old code you referred to. But as I understand, replies always come back in the oder DN1, DN2, and DN3 if the pipeline is Client->DN1->DN2->DN3.

          The patch mostly looks good. Two suggestions:
          1. In BlockReceiver.java: After receiving an ack, when seqno==-2, please remove the debug log statement.
          2. For the class PipelineAck, the field numRepliesExpected is of no use in serialization case. I would suggest that numRepliesExpected is used as a parameter for readField. This means that PipelineAck does not need to implement the Writable interface.

          Show
          Hairong Kuang added a comment - > as I understand the old code, the order of the replies in the ack packet was actually in reverse-depth order. Not sure which old code you referred to. But as I understand, replies always come back in the oder DN1, DN2, and DN3 if the pipeline is Client->DN1->DN2->DN3. The patch mostly looks good. Two suggestions: 1. In BlockReceiver.java: After receiving an ack, when seqno==-2, please remove the debug log statement. 2. For the class PipelineAck, the field numRepliesExpected is of no use in serialization case. I would suggest that numRepliesExpected is used as a parameter for readField. This means that PipelineAck does not need to implement the Writable interface.
          Hide
          Todd Lipcon added a comment -

          Updated patch with two suggestions. Good idea about not making PipelineAck writable - this is much nicer.

          But as I understand, replies always come back in the oder DN1, DN2, and DN3 if the pipeline is Client->DN1->DN2->DN3

          Oops, you're right - I misread some code.

          Show
          Todd Lipcon added a comment - Updated patch with two suggestions. Good idea about not making PipelineAck writable - this is much nicer. But as I understand, replies always come back in the oder DN1, DN2, and DN3 if the pipeline is Client->DN1->DN2->DN3 Oops, you're right - I misread some code.
          Hide
          Hairong Kuang added a comment -

          +1. The patch looks good.

          Show
          Hairong Kuang added a comment - +1. The patch looks good.
          Hide
          Todd Lipcon added a comment -

          Do you have a good test case that can reproduce HDFS-101? My general test cases (kill -9ing a pipeline member mid-write) seem to pass with just the 793 patch applied.

          Show
          Todd Lipcon added a comment - Do you have a good test case that can reproduce HDFS-101 ? My general test cases (kill -9ing a pipeline member mid-write) seem to pass with just the 793 patch applied.
          Hide
          Hairong Kuang added a comment -

          Todd, could you please take a look at the findug error?
          [exec] -1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.
          [exec]
          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          Ant test-core was passed.

          Show
          Hairong Kuang added a comment - Todd, could you please take a look at the findug error? [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. Ant test-core was passed.
          Hide
          Hairong Kuang added a comment -

          > Do you have a good test case that can reproduce HDFS-101?
          Some of the tests in HDFS-564 are supposed to test HDFS-101. But I am not sure if HDFS-793 itself could get them passed or not.

          Show
          Hairong Kuang added a comment - > Do you have a good test case that can reproduce HDFS-101 ? Some of the tests in HDFS-564 are supposed to test HDFS-101 . But I am not sure if HDFS-793 itself could get them passed or not.
          Hide
          Todd Lipcon added a comment -

          Todd, could you please take a look at the findug error?

          The findbug error is:

          Bug type REC_CATCH_EXCEPTION (click for details)
          In class org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor
          In method org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run()
          At DFSClient.java:[line 2447]
          

          This line wasn't actually touched by this patch - it's been a catch (Exception e) since HADOOP-1707 in Jan '08.

          Show
          Todd Lipcon added a comment - Todd, could you please take a look at the findug error? The findbug error is: Bug type REC_CATCH_EXCEPTION (click for details) In class org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor In method org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run() At DFSClient.java:[line 2447] This line wasn't actually touched by this patch - it's been a catch (Exception e) since HADOOP-1707 in Jan '08.
          Hide
          Todd Lipcon added a comment -

          Does this one look good now? Do you want to commit this to branch-20 as is and then I'll work on HDFS-101 separately on top of that? I think 793 is useful even without 101 (since I haven't yet managed to reproduce 101 in manual testing)

          Show
          Todd Lipcon added a comment - Does this one look good now? Do you want to commit this to branch-20 as is and then I'll work on HDFS-101 separately on top of that? I think 793 is useful even without 101 (since I haven't yet managed to reproduce 101 in manual testing)
          Hide
          Hairong Kuang added a comment -

          Regarding the findbug error, ant test-patch complained about a new bug. It does not complain about an existing one. Could you please take one more look?

          Show
          Hairong Kuang added a comment - Regarding the findbug error, ant test-patch complained about a new bug. It does not complain about an existing one. Could you please take one more look?
          Hide
          Hairong Kuang added a comment -

          I took a look at the new findbug. It is indeed not introduced by this patch. But very strangely this bug is not complained in 0.20, but is complained in 0.20+patch. Todd, what did you do that have made findbug unhappy?

          Show
          Hairong Kuang added a comment - I took a look at the new findbug. It is indeed not introduced by this patch. But very strangely this bug is not complained in 0.20, but is complained in 0.20+patch. Todd, what did you do that have made findbug unhappy?
          Hide
          Todd Lipcon added a comment -

          You know, it's really a mystery to me. The changes inside the block in question don't change anything about its exception-passing behavior, best I can tell. In fact, I think the findbugs is bogus anyway, since we really do want to catch any exception there, including a RTE, so that we can close the processor and forward out the error.

          Show
          Todd Lipcon added a comment - You know, it's really a mystery to me. The changes inside the block in question don't change anything about its exception-passing behavior, best I can tell. In fact, I think the findbugs is bogus anyway, since we really do want to catch any exception there, including a RTE, so that we can close the processor and forward out the error.
          Hide
          Hairong Kuang added a comment -

          Ok, I just committed the patch submitted on 1/16 that fixed the incompatible protocol problem caused by HDFS-793. Thanks Todd!

          Show
          Hairong Kuang added a comment - Ok, I just committed the patch submitted on 1/16 that fixed the incompatible protocol problem caused by HDFS-793 . Thanks Todd!
          Hide
          Hairong Kuang added a comment -

          I resolve this for now. If we ever want to port HDFS-101 to 0.20, let's reopen HDFS-101 or create a new jira.

          Show
          Hairong Kuang added a comment - I resolve this for now. If we ever want to port HDFS-101 to 0.20, let's reopen HDFS-101 or create a new jira.
          Hide
          Todd Lipcon added a comment -

          Sounds good. Thanks for the help reviewing.

          Show
          Todd Lipcon added a comment - Sounds good. Thanks for the help reviewing.

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Bassam Tabbara
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development