Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
I was chasing a bug where the namenode didn't declare a datanode dead even when the last contact time was 2.5 hours before.
Before I could debug, the datanode was re-imaged (all the logs were deleted) and the namenode was restarted and upgraded to new software.
While debugging, I came across this heartbeat check code where the comparison of two System.nanoTime is against the java's recommended way.
Here is the hadoop code:
DatanodeManager.java
/** Is the datanode dead? */ boolean isDatanodeDead(DatanodeDescriptor node) { return (node.getLastUpdateMonotonic() < (monotonicNow() - heartbeatExpireInterval)); }
The montonicNow() is calculated as:
Time.java
public static long monotonicNow() { final long NANOSECONDS_PER_MILLISECOND = 1000000; return System.nanoTime() / NANOSECONDS_PER_MILLISECOND; }
As per javadoc of System.nanoTime, it is clearly stated that we should subtract two nano time output
To compare two nanoTime values long t0 = System.nanoTime(); ... long t1 = System.nanoTime(); one should use t1 - t0 < 0, not t1 < t0, because of the possibility of numerical overflow.