Issue Details (XML | Word | Printable)

Key: HADOOP-4340
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Arun C Murthy
Reporter: David Litster
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

"hadoop jar" always returns exit code 0 (success) to the shell when jar throws a fatal exception

Created: 03/Oct/08 10:40 PM   Updated: 08/Jul/09 05:06 PM
Return to search
Component/s: None
Affects Version/s: 0.18.1, 0.19.0, 0.20.0
Fix Version/s: 0.18.2

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-4340_2_20081029.patch 2008-10-29 10:24 PM Arun C Murthy 2 kB
Text File Licensed for inclusion in ASF works patch-4340-1.txt 2008-10-29 08:35 AM Amareshwari Sriramadasu 0.9 kB
Text File Licensed for inclusion in ASF works patch-4340.txt 2008-10-06 10:26 AM Amareshwari Sriramadasu 0.5 kB
Environment: Ubuntu 8.04 Server, 7 Hadoop nodes, GNU bash, version 3.2.39(1)-release (i486-pc-linux-gnu)
Issue Links:
Reference
 

Hadoop Flags: Reviewed
Resolution Date: 29/Oct/08 11:10 PM


 Description  « Hide
Running "hadoop jar" always returns 0 (success) when the jar dies with a stack trace. As an example, run these commands:

/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/hadoop-0.18.1-examples.jar pi 10 10 2>&1; echo $?
exits with 0

/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/hadoop-0.18.1-examples.jar pi 2>&1; echo $?
exits with 255

/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/hadoop-0.18.1-examples.jar 2>&1; echo $?
exits with 0

This seems to be expected behavior. However, running:

/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/hadoop-0.18.1-examples.jar pi 10 badparam 2>&1; echo $?
java.lang.NumberFormatException: For input string: "badparam"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Long.parseLong(Long.java:403)
at java.lang.Long.parseLong(Long.java:461)
at org.apache.hadoop.examples.PiEstimator.run(PiEstimator.java:241)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:252)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
exits with 0.

In my opinion, if a jar throws an exception that kills the program being run, and the developer doesn't catch the exception and do a sane exit with a exit code, hadoop should at least exit with a non-zero exit code.

As another example, while running a main class that exits with an exit code of 201, Hadoop will preserve the correct exit code:

public static void main(String[] args) throws Exception { System.exit(201); }

But when deliberately creating a null pointer exception, Hadoop exits with 0.

public static void main(String[] args) throws Exception { Object o = null; o.toString(); System.exit(201); }

This behaviour makes it very difficult, if not impossible, to use Hadoop programatically with tools such as HOD or non-Java data processing frameworks, since if a jar crashes with an unhandled exception, Hadoop doesn't inform the calling program in a well-bahaved way (polling stderr for output is not a very good way to detect application failure).

I'm not a Java programmer, so I don't know what the best code to signal failure would be.

Please let me know what other information I can include about my setup

Thanks.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Steve Loughran added a comment - 06/Oct/08 09:58 AM
Looking at the stack trace, the cause is t

JobShell.main() doesn't set an exit code

public static void main(String[] argv) throws Exception { JobShell jshell = new JobShell(); ToolRunner.run(jshell, argv); }

It should go System.exit(ToolRunner.run(...)))

question is, what is going to break?


Amareshwari Sriramadasu added a comment - 06/Oct/08 10:20 AM
Thanks Steve for finding the cause. That looks like a bug, it should not break anything.

Amareshwari Sriramadasu added a comment - 06/Oct/08 10:26 AM
Patch returning exit code from JobShell

Vinod K V added a comment - 06/Oct/08 11:18 AM
The patch will still return a zero exit code if the jar throws an uncaught exception. It merely tries to pass any non-zero return code that the Tool itself returns; uncaught exceptions are still not shielded.

Vinod K V added a comment - 07/Oct/08 04:51 AM
My bad, an exception in main WILL return a non-zero exit code. But the reason why I've seen that the above patch was not sufficient was that ExamplesDriver catches uncaught exceptions from examples and returns silently. I think that needs to be fixed.

+1 for the fix. Examples can be fixed here or separately.


Amareshwari Sriramadasu added a comment - 29/Oct/08 08:35 AM
Changed ExampleDriver also to return with non-zero exit code.

Amareshwari Sriramadasu added a comment - 29/Oct/08 10:15 AM
test-patch result:
     [exec]
     [exec] -1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no tests are needed for this patch.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.

All core and contrib tests passed on my machine


Arun C Murthy added a comment - 29/Oct/08 06:37 PM
In ExampleDriver.java It isn't quite elegant to call System.exit from inside a catch clause, we should use an exit code:
int exitCode = -1;
...

try {
 ...
 pgd.driver(argv);
 exitCode = 0;
} catch(...) {
 ...
}

System.exit(exitCode);

Ideally, ProgramDriver.driver should have returned an exit code... sigh!

We should also fix ProgramDriver.driver to throw an IllegalArgumentException when the sanity checks fail.


Arun C Murthy added a comment - 29/Oct/08 10:24 PM
Updated patch.

Owen O'Malley added a comment - 29/Oct/08 10:28 PM
+1

Arun C Murthy added a comment - 29/Oct/08 11:10 PM
I just committed this. Thanks to Amareshwari and Steve too!

Hadoop QA added a comment - 30/Oct/08 01:02 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12393014/HADOOP-4340_2_20081029.patch
against trunk revision 709022.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3508/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3508/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3508/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3508/console

This message is automatically generated.


Hudson added a comment - 30/Oct/08 04:10 PM
Integrated in Hadoop-trunk #647 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/647/)
. Correctly set the exit code from JobShell.main so that the 'hadoop jar' command returns the right code to the user.