Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22330

Linear containsKey operation for serialized maps.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.1, 2.2.0
    • Fix Version/s: 2.3.0
    • Component/s: Spark Core
    • Labels:

      Description

      One of our production application which aggressively uses cached spark RDDs degraded after increasing volumes of data though it shouldn't. Fast profiling session showed that the slowest part was SerializableMapWrapper#containsKey: it delegates get and remove to actual implementation, but containsKey is inherited from AbstractMap which is implemented in linear time via iteration over whole keySet. A workaround was simple: replacing all containsKey with get(key) != null solved the issue.

      Nevertheless, it would be much simpler for everyone if the issue will be fixed once and for all.
      A fix is straightforward, delegate containsKey to actual implementation.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                whoosh Alexander
                Reporter:
                whoosh Alexander
                Shepherd:
                Alexander
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 5m
                  5m
                  Remaining:
                  Remaining Estimate - 5m
                  5m
                  Logged:
                  Time Spent - Not Specified
                  Not Specified