Thanks for the patch, Chris! Initial comments:
Is this intended to apply to all distributed cache items or only those that need to be uploaded during job submission? Some comments in the JIRA and the property descriptions imply it also should apply to items in the distributed cache that already reside in HDFS, but it doesn't look like the patch does that. The changes are to JobResourceUploader which AFAIK only gets involved on files that potentially need to be copied to the staging area before job submission. I'm not seeing how this affects items already in HDFS elsewhere before job submission (i.e.: items already in mapreduce.job.cache.*)
Speaking of mapreduce.job.cache.*, it would be nice if the properties used that same prefix since it's related to the distributed cache. Also I'd personally prefer something like mapreduce.job.cache.limit.max-files, mapreduce.job.cache.limit.max-file-mb, and mapreduce.job.cache.limit.max-total-mb if it's supposed to apply to the entire distributed cache.
The TotalNumberOfFilesAndSize API is verbose and error-prone – is there ever a valid reason to call incrementTotalSize without also calling incrementTotalNumberOfFiles and findMaxFileSize? Probably does the wrong thing if the client doesn't call them all for each file. IMHO there should just be two APIs, addFile(long filesize) and checkLimit(). Or maybe just one if it's OK to throw during addFile() directly.
Suggestion: TotalNumberOfFilesAndSize might be easier to comprehend (and type) if named something like LimitsChecker. Also its constructor can just be passed a Configuration. Then it can hide all the confs and other implementation details related to the dist cache limits, and a predicate function like hasLimits() can be used to do the early-out checks. Or maybe we just pass it the files directly and it can decide internally whether to visit the paths or early-out.
I think it would be very helpful if the file path was shown in the error message when something exceeds the single-file limit, otherwise the user has to manually track it down among all the files involved.
Nit: Javadocs listing the parameters to a method but no description for any of those parameters isn't useful.