Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.20.1, 0.21.0, 0.22.0
-
None
-
None
-
None
Description
We can have many FileInput objects in the memory, depending on the number of mappers.
It will save tons of memory on JobTracker and JobClient if we intern those Strings for host names.
FileInputFormat.java: for (NodeInfo host: hostList) { // Strip out the port number from the host name - retVal[index++] = host.node.getName().split(":")[0]; + retVal[index++] = host.node.getName().split(":")[0].intern(); if (index == replicationFactor) { done = true; break; } }
More on String.intern(): http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
It will also save a lot of memory by changing the class of file from Path to String. Path contains a java.net.URI which internally contains ~10 String fields. This will also be a huge saving.
private Path file;