Details
Description
HBase PreCommit builds frequently gave us NumberFormatException.
2011-12-20 01:44:01,180 WARN [main] mapred.JobClient(784): No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). java.lang.NumberFormatException: For input string: "18446743988060683582" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:422) at java.lang.Long.parseLong(Long.java:468) at org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413) at org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148) at org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401) at org.apache.hadoop.mapred.Task.initialize(Task.java:536) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249)
From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, causing NFE:
// Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss) pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
You can find information on the OS at the beginning of https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
asf011.sp2.ygridcore.net Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 20 file size (blocks, -f) unlimited pending signals (-i) 16382 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 60000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 2048 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited 60000 Running in Jenkins mode
From Nicolas Sze:
It looks like that the ppid is a 64-bit positive integer but Java long is signed and so only works with 63-bit positive integers. In your case, 2^64 > 18446743988060683582 > 2^63. Therefore, there is a NFE.
I propose changing allProcessInfo to Map<String, ProcessInfo> so that we don't encounter this problem by avoiding parsing large integer.
Attachments
Attachments
Issue Links
- duplicates
-
MAPREDUCE-3836 TestContainersMonitor failing intermittently
- Resolved
- is related to
-
MAPREDUCE-1201 Make ProcfsBasedProcessTree collect CPU usage information
- Closed
- relates to
-
MAPREDUCE-5715 ProcfsBasedProcessTree#constructProcessInfo() can still throw NumberFormatException
- Open