Hadoop Common
  1. Hadoop Common
  2. HADOOP-4533

HDFS client of hadoop 0.18.1 and HDFS server 0.18.2 (0.18 branch) not compatible

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.18.1
    • Fix Version/s: 0.18.2
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Not sure whether this is considered as a bug or is an expected case.
      But here are the details.

      I have a cluster using a build from hadoop 0.18 branch.
      When I tried to use hadoop 0.18.1 dfs client to load files to it, I got the following exceptions:

      hadoop --config ~/test dfs -copyFromLocal gridmix-env /tmp/.
      08/10/28 16:23:00 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream
      08/10/28 16:23:00 INFO dfs.DFSClient: Abandoning block blk_-439926292663595928_1002
      08/10/28 16:23:06 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream
      08/10/28 16:23:06 INFO dfs.DFSClient: Abandoning block blk_5160335053668168134_1002
      08/10/28 16:23:12 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream
      08/10/28 16:23:12 INFO dfs.DFSClient: Abandoning block blk_4168253465442802441_1002
      08/10/28 16:23:18 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream
      08/10/28 16:23:18 INFO dfs.DFSClient: Abandoning block blk_-2631672044886706846_1002
      08/10/28 16:23:24 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block.
      at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2349)
      at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1735)
      at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1912)

      08/10/28 16:23:24 WARN dfs.DFSClient: Error Recovery for block blk_-2631672044886706846_1002 bad datanode[0]
      copyFromLocal: Could not get block locations. Aborting...
      Exception closing file /tmp/gridmix-env
      java.io.IOException: Could not get block locations. Aborting...
      at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
      at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
      at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)

      This problem has a severe impact on Pig 2.0, since it is pre-packaged with hadoop 0.18.1 and will use
      Hadoop 0.18.1 dfs client in its interaction with hadoop cluster.
      That means that Pig 2.0 will not work with the to be released hadoop 0.18.2

        Issue Links

          Activity

          Hide
          Hairong Kuang added a comment -

          I've committed this.

          Show
          Hairong Kuang added a comment - I've committed this.
          Hide
          Hairong Kuang added a comment -

          Unit test passed and here is the ant test-patch result:
          [exec] +1 overall.

          [exec] +1 @author. The patch does not contain any @author tags.

          [exec] +1 tests included. The patch appears to include 3 new or modified tests.

          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.

          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          [exec] +1 findbugs. The patch does not introduce any new Findbugswarnings.

          Show
          Hairong Kuang added a comment - Unit test passed and here is the ant test-patch result: [exec] +1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugswarnings.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > We should open a jira on Could not read from stream" problem for 0.19 or 0.20.

          Created HADOOP-4538

          Show
          Tsz Wo Nicholas Sze added a comment - > We should open a jira on Could not read from stream" problem for 0.19 or 0.20. Created HADOOP-4538
          Hide
          Hairong Kuang added a comment -

          Konstantin, the junit tests have passed. I will test this patch on a real dfs cluster. For 0.19 and 0.20, we keep the patch to HADOOP-4116. We should open a jira on Could not read from stream" problem for 0.19 or 0.20.

          Show
          Hairong Kuang added a comment - Konstantin, the junit tests have passed. I will test this patch on a real dfs cluster. For 0.19 and 0.20, we keep the patch to HADOOP-4116 . We should open a jira on Could not read from stream" problem for 0.19 or 0.20.
          Hide
          Konstantin Shvachko added a comment -

          +1
          This looks reasonable for 0.18. It fixes the semaphore contention problem and retains the data transfer protocol compatible across 0.18
          We need to run tests with this patch.
          For 0.19 and 0.20 it is better to open another jira.

          Show
          Konstantin Shvachko added a comment - +1 This looks reasonable for 0.18. It fixes the semaphore contention problem and retains the data transfer protocol compatible across 0.18 We need to run tests with this patch. For 0.19 and 0.20 it is better to open another jira.
          Hide
          Hairong Kuang added a comment -

          Ok, I reverted HADOOP-4116 in branch 18. This patch removes the incompatible change in the patch to HADOOP-4116 but keeps the critical code to prevent the Balancer from overusing network bandwidth and avoid the deadlock problem that was described in HADOOP-4116.

          Show
          Hairong Kuang added a comment - Ok, I reverted HADOOP-4116 in branch 18. This patch removes the incompatible change in the patch to HADOOP-4116 but keeps the critical code to prevent the Balancer from overusing network bandwidth and avoid the deadlock problem that was described in HADOOP-4116 .
          Hide
          Konstantin Shvachko added a comment - - edited

          It looks like we need 2 patches here:

          • for 0.18 the incompatible change in data transfer protocol should be removed.
          • for 0.19 we need to provide a clear message saying that data transfer protocols are incompatible rather than "Could not read from stream"
          Show
          Konstantin Shvachko added a comment - - edited It looks like we need 2 patches here: for 0.18 the incompatible change in data transfer protocol should be removed. for 0.19 we need to provide a clear message saying that data transfer protocols are incompatible rather than "Could not read from stream"
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > ... based on Nicholas' testing.

          Reproduce this

          • start a 0.18.1 cluster
          • write a file by a 0.18.2 client, e.g. hadoop fs -put src dst
          • It will fail with similar error messages shown in the description.

          It won't fail if the patch in HADOOP-4116 is reverted.

          Show
          Tsz Wo Nicholas Sze added a comment - > ... based on Nicholas' testing. Reproduce this start a 0.18.1 cluster write a file by a 0.18.2 client, e.g. hadoop fs -put src dst It will fail with similar error messages shown in the description. It won't fail if the patch in HADOOP-4116 is reverted.
          Hide
          Owen O'Malley added a comment -

          This seems to have been caused by HADOOP-4116, based on Nicholas' testing.

          Show
          Owen O'Malley added a comment - This seems to have been caused by HADOOP-4116 , based on Nicholas' testing.

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Runping Qi
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development