The shared cache needs to handle resource sym-linking at the YARN layer. Currently, we let the application layer (i.e. mapreduce) handle this, but it is probably better for all applications if it is handled transparently.
Here is the scenario:
Imagine two separate jars (with unique checksums) that have the same name job.jar.
They are stored in the shared cache as two separate resources:
A new application tries to use both of these resources, but internally refers to them as different names:
foo.jar maps to checksum1
bar.jar maps to checksum2
When the shared cache returns the path to the resources, both resources are named the same (i.e. job.jar). Because of this, when the resources are localized one of them clobbers the other. This is because both symlinks in the container_id directory are the same name (i.e. job.jar) even though they point to two separate resource directories.
Originally we tackled this in the MapReduce client by using the fragment portion of the resource url. This, however, seems like something that should be solved at the YARN layer.