The CombineFileInputFormat class in org.apache.hadoop.mapred.lib (the old API) has a couple of issues. These issues were addressed in the new API (
MAPREDUCE-1423), but the old class was not fixed.
The main issue the JIRA refers to is a performance problem. However, IMO there is a more serious problem which is a thread-safety issue (rackToNodes) which was fixed alongside.
What is the policy on addressing issues in the old API? Can we backport this to the old class?