Issue Details (XML | Word | Printable)

Key: HADOOP-4533
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Hairong Kuang
Reporter: Runping Qi
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

HDFS client of hadoop 0.18.1 and HDFS server 0.18.2 (0.18 branch) not compatible

Created: 28/Oct/08 04:36 PM   Updated: 08/Jul/09 04:43 PM
Return to search
Component/s: None
Affects Version/s: 0.18.1
Fix Version/s: 0.18.2

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works balancerRM_br18.patch 2008-10-28 10:02 PM Hairong Kuang 8 kB
Issue Links:
Reference
 

Hadoop Flags: Reviewed
Resolution Date: 29/Oct/08 09:52 PM


 Description  « Hide
Not sure whether this is considered as a bug or is an expected case.
But here are the details.

I have a cluster using a build from hadoop 0.18 branch.
When I tried to use hadoop 0.18.1 dfs client to load files to it, I got the following exceptions:

hadoop --config ~/test dfs -copyFromLocal gridmix-env /tmp/.
08/10/28 16:23:00 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream
08/10/28 16:23:00 INFO dfs.DFSClient: Abandoning block blk_-439926292663595928_1002
08/10/28 16:23:06 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream
08/10/28 16:23:06 INFO dfs.DFSClient: Abandoning block blk_5160335053668168134_1002
08/10/28 16:23:12 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream
08/10/28 16:23:12 INFO dfs.DFSClient: Abandoning block blk_4168253465442802441_1002
08/10/28 16:23:18 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream
08/10/28 16:23:18 INFO dfs.DFSClient: Abandoning block blk_-2631672044886706846_1002
08/10/28 16:23:24 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block.
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2349)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1735)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1912)

08/10/28 16:23:24 WARN dfs.DFSClient: Error Recovery for block blk_-2631672044886706846_1002 bad datanode[0]
copyFromLocal: Could not get block locations. Aborting...
Exception closing file /tmp/gridmix-env
java.io.IOException: Could not get block locations. Aborting...
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)

This problem has a severe impact on Pig 2.0, since it is pre-packaged with hadoop 0.18.1 and will use
Hadoop 0.18.1 dfs client in its interaction with hadoop cluster.
That means that Pig 2.0 will not work with the to be released hadoop 0.18.2



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Owen O'Malley added a comment - 28/Oct/08 05:27 PM
This seems to have been caused by HADOOP-4116, based on Nicholas' testing.

Tsz Wo (Nicholas), SZE added a comment - 28/Oct/08 05:42 PM
> ... based on Nicholas' testing.

Reproduce this

  • start a 0.18.1 cluster
  • write a file by a 0.18.2 client, e.g. hadoop fs -put src dst
  • It will fail with similar error messages shown in the description.

It won't fail if the patch in HADOOP-4116 is reverted.


Konstantin Shvachko added a comment - 28/Oct/08 06:20 PM - edited
It looks like we need 2 patches here:
  • for 0.18 the incompatible change in data transfer protocol should be removed.
  • for 0.19 we need to provide a clear message saying that data transfer protocols are incompatible rather than "Could not read from stream"

Hairong Kuang added a comment - 28/Oct/08 09:54 PM
Ok, I reverted HADOOP-4116 in branch 18. This patch removes the incompatible change in the patch to HADOOP-4116 but keeps the critical code to prevent the Balancer from overusing network bandwidth and avoid the deadlock problem that was described in HADOOP-4116.

Konstantin Shvachko added a comment - 29/Oct/08 12:07 AM
+1
This looks reasonable for 0.18. It fixes the semaphore contention problem and retains the data transfer protocol compatible across 0.18
We need to run tests with this patch.
For 0.19 and 0.20 it is better to open another jira.

Hairong Kuang added a comment - 29/Oct/08 12:24 AM
Konstantin, the junit tests have passed. I will test this patch on a real dfs cluster. For 0.19 and 0.20, we keep the patch to HADOOP-4116. We should open a jira on Could not read from stream" problem for 0.19 or 0.20.

Tsz Wo (Nicholas), SZE added a comment - 29/Oct/08 07:03 PM
> We should open a jira on Could not read from stream" problem for 0.19 or 0.20.

Created HADOOP-4538


Hairong Kuang added a comment - 29/Oct/08 07:14 PM
Unit test passed and here is the ant test-patch result:
[exec] +1 overall.

[exec] +1 @author. The patch does not contain any @author tags.

[exec] +1 tests included. The patch appears to include 3 new or modified tests.

[exec] +1 javadoc. The javadoc tool did not generate any warning messages.

[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.

[exec] +1 findbugs. The patch does not introduce any new Findbugswarnings.


Hairong Kuang added a comment - 29/Oct/08 09:52 PM
I've committed this.