Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22471

SQLListener consumes much memory causing OutOfMemoryError

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.2.1
    • SQL, Web UI
    • Spark 2.2.0, Linux

    • Important

    Description

      SQLListener may grow very large when Spark runs complex multi-stage requests. The listener tracks metrics for all stages in _stageIdToStageMetrics hash map. SQLListener has some means to cleanup this hash map regularly, but this is not enough. Precisely, the method trimExecutionsIfNecessary ensures that _stageIdToStageMetrics does not have metrics for very old data; this method runs on each execution completion.
      However, if an execution has many stages, SQLListener keeps adding new entries to _stageIdToStageMetrics without calling trimExecutionsIfNecessary. The hash map may grow to enormous size.
      Strictly speaking, it is not a memory leak, because finally trimExecutionsIfNecessary cleans the hash map. However, the driver program has high odds to crash with OutOfMemoryError (and it does).

      Attachments

        1. SQLListener_stageIdToStageMetrics_retained_size.png
          28 kB
          Arseniy Tashoyan
        2. SQLListener_retained_size.png
          35 kB
          Arseniy Tashoyan

        Activity

          People

            tashoyan Arseniy Tashoyan
            tashoyan Arseniy Tashoyan
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 72h
                72h
                Remaining:
                Remaining Estimate - 72h
                72h
                Logged:
                Time Spent - Not Specified
                Not Specified