Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20883

Improve StateStore APIs for efficiency

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • Structured Streaming
    • None

    Description

      Current state store API has a bunch of problems that causes too many transient objects causing memory pressure.

      • StateStore.get() returns Options which forces creation of Some/None objects for every get
      • StateStore.iterator() returns tuples which forces creation of new tuple for each record returned
      • StateStore.updates() requires the implementation to keep track of updates, while this is used minimally (only by Append mode in streaming aggregations). This can be totally removed.

      Attachments

        Activity

          People

            tdas Tathagata Das
            tdas Tathagata Das
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: