Can we make InputSplitLocationInfo extend InputSplit? It doesn't make sense for any class to implement only InputSplitLocationInfo without implementing InputSplit.
Nothing to do with this patch. It is unfortunate that mapreduce.InputSplit doesn't implement mapred.InputSplit. Would it be easy to fix it?
Not entirely sure the reasoning there, but as this stuff can have binary compatibility implications in mysterious ways, I'd rather not touch it if we don't need to.
The following two constants should probably be in SplitLocationInfo?
They're only used in FileSplit and not in SplitLocationInfo - is there utility in moving them away from where they're used? I'd like to avoid adding these constants to the API because, when we include additional storage types, each SplitLocationInfo could end up as a union of storage types - needing to add a ON_DISK_AND_IN_FLASH_AND_IN_MEMORY would be ugly.
Instead of assigning ON_DISK by default, would it make sense to set it post null-check after the loop for checking if it is in memory.
Any advantage to this? It would add extra code, an extra branch, and I don't think be particularly more readable.
Do you think it would make sense to include the string corresponding to the location in SplitLocationInfo itself?