Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22330

Linear containsKey operation for serialized maps.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.2.1, 2.2.0
    • 2.3.0
    • Spark Core

    Description

      One of our production application which aggressively uses cached spark RDDs degraded after increasing volumes of data though it shouldn't. Fast profiling session showed that the slowest part was SerializableMapWrapper#containsKey: it delegates get and remove to actual implementation, but containsKey is inherited from AbstractMap which is implemented in linear time via iteration over whole keySet. A workaround was simple: replacing all containsKey with get(key) != null solved the issue.

      Nevertheless, it would be much simpler for everyone if the issue will be fixed once and for all.
      A fix is straightforward, delegate containsKey to actual implementation.

      Attachments

        Issue Links

          Activity

            People

              whoosh Alexander
              whoosh Alexander
              Alexander Alexander
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 5m
                  5m
                  Remaining:
                  Remaining Estimate - 5m
                  5m
                  Logged:
                  Time Spent - Not Specified
                  Not Specified