[SPARK-22330] Linear containsKey operation for serialized maps. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.2.1, 2.2.0
Fix Version/s: 2.3.0
Component/s: Spark Core
Labels:
- performance

Description

One of our production application which aggressively uses cached spark RDDs degraded after increasing volumes of data though it shouldn't. Fast profiling session showed that the slowest part was SerializableMapWrapper#containsKey: it delegates get and remove to actual implementation, but containsKey is inherited from AbstractMap which is implemented in linear time via iteration over whole keySet. A workaround was simple: replacing all containsKey with get(key) != null solved the issue.

Nevertheless, it would be much simpler for everyone if the issue will be fixed once and for all.
A fix is straightforward, delegate containsKey to actual implementation.

Attachments

Issue Links

is related to

SPARK-21657 Spark has exponential time complexity to explode(array of structs)

Resolved

links to

[Github] Pull Request #19553 (Whoosh)

Activity

People

Assignee:: Alexander

Reporter:: Alexander

Shepherd:: Alexander

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 22/Oct/17 21:44

Updated:: 06/Nov/17 23:50

Resolved:: 06/Nov/17 23:47

Time Tracking

Estimated:

Remaining:

Logged:

Not Specified