## Details

• Type: Bug
• Status: Closed
• Priority: Major
• Resolution: Fixed
• Affects Version/s: 2.4.0
• Fix Version/s:
• Component/s:
• Labels:
None
• Environment:

Windows + JDK7. The issue was hit while upgrading from 1.x to 2.4.

• Target Version/s:
Reviewed

## Description

I try to upgrade Hadoop from 1.x and 2.4, but DataNode failed to start due to hard link exception.
Repro steps:
*Stop all services
*Try to start datanode, begin to see Hardlink exception in datanode service log.


2014-04-10 22:47:11,655 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8010: starting
2014-04-10 22:47:11,656 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2014-04-10 22:47:11,999 INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and name-node layout version: -56
old LV = -44; old CTime = 0.
new LV = -55; new CTime = 1397168400373
2014-04-10 22:47:12,254 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool <registering> (Datanode Uuid unassigned) service to myhost/10.0.0.1:8020
2014-04-10 22:47:12,258 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to myhost/10.0.0.1:8020
2014-04-10 22:47:12,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN
java.lang.Exception: trace
2014-04-10 22:47:12,359 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)
2014-04-10 22:47:12,360 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN
java.lang.Exception: trace
2014-04-10 22:47:14,360 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2014-04-10 22:47:14,361 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at myhost/10.0.0.1
************************************************************/


## Attachments

1. HDFS-6233.01.patch
2 kB
Arpit Agarwal
2. HDFS-6233.02.patch
7 kB
Chris Nauroth
3. HDFS-6233.03.patch
8 kB
Arpit Agarwal

## Activity

Hide
Arpit Agarwal added a comment -

The "1>NUL" is passed as a parameter to the winutils command instead of the being interpreted by the shell. The simplest fix is to just remove it.

Show
Arpit Agarwal added a comment - The "1>NUL" is passed as a parameter to the winutils command instead of the being interpreted by the shell. The simplest fix is to just remove it.
Hide
Chris Nauroth added a comment -

+1 for the patch, pending Jenkins run. Thanks a lot for tracking down this tricky bug!

Show
Chris Nauroth added a comment - +1 for the patch, pending Jenkins run. Thanks a lot for tracking down this tricky bug!
Hide
Arpit Agarwal added a comment -

Initial patch, I will probably add a unit test before its ready for review.

Show
Arpit Agarwal added a comment - Initial patch, I will probably add a unit test before its ready for review.
Hide
Arpit Agarwal added a comment -

Our comments crossed, thanks for the quick review Chris!

I'd also like to add a unit test, will look into it tomorrow.

Show
Arpit Agarwal added a comment - Our comments crossed, thanks for the quick review Chris! I'd also like to add a unit test, will look into it tomorrow.
Hide
Chris Nauroth added a comment -

Good point. Thanks again.

Show
Chris Nauroth added a comment - Good point. Thanks again.
Hide
Tsz Wo Nicholas Sze added a comment -

Show
Hide

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12639738/HDFS-6233.01.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 contrib tests. The patch passed contrib unit tests.

This message is automatically generated.

Show
Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12639738/HDFS-6233.01.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.fs.TestHardLink +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6649//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6649//console This message is automatically generated.
Hide
Chris Nauroth added a comment -

Another problem discovered during testing is that the external process launched to make the hardlink can hang. On Windows, it's important for the launching process to fully consume stdout and stderr. Otherwise, the process never really exits. This is a fundamental problem with the JDK Process class. Fortunately, our own Shell class already implements the right workarounds, so it's easy to fix by converting HardLink to use Shell instead of Process. I'm attaching a patch that combines that fix with Arpit's original fix.

Show
Chris Nauroth added a comment - Another problem discovered during testing is that the external process launched to make the hardlink can hang. On Windows, it's important for the launching process to fully consume stdout and stderr. Otherwise, the process never really exits. This is a fundamental problem with the JDK Process class. Fortunately, our own Shell class already implements the right workarounds, so it's easy to fix by converting HardLink to use Shell instead of Process . I'm attaching a patch that combines that fix with Arpit's original fix.
Hide
Arpit Agarwal added a comment - - edited

+1 from me, however perhaps it will be appropriate for another committer to +1 it too.

I've tested the updated patch on OS X and Windows and it fixes the hang.

Show
Arpit Agarwal added a comment - - edited +1 from me, however perhaps it will be appropriate for another committer to +1 it too. I've tested the updated patch on OS X and Windows and it fixes the hang.
Hide

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12639865/HDFS-6233.02.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test files.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 contrib tests. The patch passed contrib unit tests.

This message is automatically generated.

Show
Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12639865/HDFS-6233.02.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6654//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6654//console This message is automatically generated.
Hide
Jing Zhao added a comment -

+1 the patch looks good to me. Thanks for the fix Chris Nauroth and Arpit Agarwal!

Show
Jing Zhao added a comment - +1 the patch looks good to me. Thanks for the fix Chris Nauroth and Arpit Agarwal !
Hide
Tsz Wo Nicholas Sze added a comment -

Questions:

• hardLinkMultPrefix uses "cmd" but getLinkMultArgLength uses "cmd.exe". Do they have to be matched? If it is a bug, we should use the static variables to construct the string.
• Why there is a trailing space in the line below
+              + Shell.WINUTILS + " hardlink create \\%f %f ").length();
^

• Do we have a unit test covering this method?

If you are going to update the patch, please add hardLinkCommand to all the exception messages. It will be much easy to debug. Otherwise, we may change this in a separated JIRA.

(I also suggest to change winutils for better error messages later on. Before this patch, it outputs "Usage: hardlink create [LINKNAME] [FILENAME] |Incorrect command line arguments." which is created by concatenating the first line of the usage string shown below with "Incorrect command line arguments." Such concatenation seems arbitrary. We should change the error message to be more specific, e.g. "Incorrect number of command line arguments, expect 3 or 4 but actually number is 5. The input command is ...")

//hardlink.c
{
fwprintf(stdout, L"\
Creates a new hardlink on the existing file or displays the number of links\n\
for the given file\n");
}

Show
Hide
Tsz Wo Nicholas Sze added a comment -

> ... expect 3 or 4 but actually number is 5 ...

For hardlink create, it must be 4. So it should be "... expect 4 but actually number is 5 ..."

Show
Tsz Wo Nicholas Sze added a comment - > ... expect 3 or 4 but actually number is 5 ... For hardlink create, it must be 4. So it should be "... expect 4 but actually number is 5 ..."
Hide
Tsz Wo Nicholas Sze added a comment -

If there are bugs in getLinkMultArgLength but the bugs do not affect upgrade, we may fix them separately.

Show
Tsz Wo Nicholas Sze added a comment - If there are bugs in getLinkMultArgLength but the bugs do not affect upgrade, we may fix them separately.
Hide
Tsz Wo Nicholas Sze added a comment -
Show
Tsz Wo Nicholas Sze added a comment - Chris Nauroth / Arpit Agarwal , any update?
Hide
Arpit Agarwal added a comment -

Updated patch to address most of the feedback from Tsz Wo Nicholas Sze. I also updated the exception message for createHardLink to include the failed command. Let's address hardlink.c issues in a separate JIRA.

We have existing unit tests but they did not catch this failure. There appears to be some environment/JDK version issue that we were not able to figure out.

Longer term we should just deprecate all this code and use Files.createLink if we detect Java 7+. It will be much more efficient than launching a new process for each file.

Show
Arpit Agarwal added a comment - Updated patch to address most of the feedback from Tsz Wo Nicholas Sze . I also updated the exception message for createHardLink to include the failed command. Let's address hardlink.c issues in a separate JIRA. We have existing unit tests but they did not catch this failure. There appears to be some environment/JDK version issue that we were not able to figure out. Longer term we should just deprecate all this code and use Files.createLink if we detect Java 7+. It will be much more efficient than launching a new process for each file.
Hide
Tsz Wo Nicholas Sze added a comment -

+1 patch looks good.

Show
Tsz Wo Nicholas Sze added a comment - +1 patch looks good.
Hide

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12641844/HDFS-6233.03.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test files.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 contrib tests. The patch passed contrib unit tests.

This message is automatically generated.

Show
Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641844/HDFS-6233.03.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6726//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6726//console This message is automatically generated.
Hide
Arpit Agarwal added a comment -

The TestMetricsSystemImpl failure looks unrelated to this patch.

Thanks for the reviews Tsz Wo Nicholas Sze and Jing Zhao; and also thanks to Chris Nauroth for co-authoring the patch. I committed this to trunk and branch-2.

Show
Arpit Agarwal added a comment - The TestMetricsSystemImpl failure looks unrelated to this patch. Thanks for the reviews Tsz Wo Nicholas Sze and Jing Zhao ; and also thanks to Chris Nauroth for co-authoring the patch. I committed this to trunk and branch-2.
Hide

HADOOP-10540. Datanode upgrade in Windows fails with hardlink error. (Contributed by Chris Nauroth and Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1589923)

Show
Hide

HADOOP-10540. Datanode upgrade in Windows fails with hardlink error. (Contributed by Chris Nauroth and Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1589923)

Show
Hide

HADOOP-10540. Datanode upgrade in Windows fails with hardlink error. (Contributed by Chris Nauroth and Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1589923)

Show
Hide

HADOOP-10540. Datanode upgrade in Windows fails with hardlink error. (Contributed by Chris Nauroth and Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1589923)

Show

## People

• Assignee:
Arpit Agarwal
Reporter:
Huan Huang