Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.2.1, 2.2.0
Description
One of our production application which aggressively uses cached spark RDDs degraded after increasing volumes of data though it shouldn't. Fast profiling session showed that the slowest part was SerializableMapWrapper#containsKey: it delegates get and remove to actual implementation, but containsKey is inherited from AbstractMap which is implemented in linear time via iteration over whole keySet. A workaround was simple: replacing all containsKey with get(key) != null solved the issue.
Nevertheless, it would be much simpler for everyone if the issue will be fixed once and for all.
A fix is straightforward, delegate containsKey to actual implementation.
Attachments
Issue Links
- is related to
-
SPARK-21657 Spark has exponential time complexity to explode(array of structs)
- Resolved
- links to