Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6055

Memory leak in pyspark sql due to incorrect equality check

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.1.1, 1.2.1, 1.3.0
    • Fix Version/s: 1.1.2, 1.2.2, 1.3.0
    • Component/s: PySpark, SQL
    • Labels:
      None

      Description

      The _eq_ of DataType is not correct, class cache is not used correctly (created class can not be find by dataType), then it will create lots of classes (saved in _cached_cls), never released.

      Also, all same DataType have same hash code, there will be many object in a dict with the same hash code, end with hash attach, it's very slow to access this dict (depends on the implementation of CPython).

        Attachments

          Activity

            People

            • Assignee:
              davies Davies Liu
              Reporter:
              davies Davies Liu
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: