One of our production application which aggressively uses cached spark RDDs degraded after increasing volumes of data though it shouldn't. Fast profiling session showed that the slowest part was SerializableMapWrapper#containsKey: it delegates get and remove to actual implementation, but containsKey is inherited from AbstractMap which is implemented in linear time via iteration over whole keySet. A workaround was simple: replacing all containsKey with get(key) != null solved the issue.
Nevertheless, it would be much simpler for everyone if the issue will be fixed once and for all.
A fix is straightforward, delegate containsKey to actual implementation.