Here I have by mistake called 'to' using 'D.to' instead of 'D::to'. The D relation gives null output.
First Map Reduce job computes D which give null results.
The MapPlan of 2nd job
Here at F , the file /tmp/temp-1607149525/tmp281350188 which is the output of the 1st Mapreduce Job is repeatedly read.
If the input to F was non empty, since I am calling the scalar wrongly, it would have failed with the expected error message 'Scalar has more than 1 row in the output'.
But since its null, it returns in ReadScalars before the exception is thrown and gives these in the task logs repeatedly
That is its reading the '/tmp/temp-1607149525/tmp281350188' file again and again which was causing high namenode operation.
The cost of one small mistake had ended up causing heavy namenode operations.
|Field||Original Value||New Value|
|Assignee||Rohini Palaniswamy [ rohini ]|
|Status||Open [ 1 ]||Patch Available [ 10002 ]|
|Fix Version/s||0.12.1 [ 12324970 ]|
|Fix Version/s||0.13.0 [ 12324971 ]|
|Status||Patch Available [ 10002 ]||Resolved [ 5 ]|
|Hadoop Flags||Reviewed [ 10343 ]|
|Resolution||Fixed [ 1 ]|
|Status||Resolved [ 5 ]||Closed [ 6 ]|