[SPARK-2629] Improved state management for Spark Streaming (mapWithState) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Epic
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.9.2, 1.0.2, 1.2.2, 1.3.1, 1.4.1, 1.5.1
Fix Version/s: 1.6.0
Component/s: DStreams
Labels:
None

Epic Name:
Improved State Management

Description

Current updateStateByKey provides stateful processing in Spark Streaming. It allows the user to maintain per-key state and manage that state using an updateFunction. The updateFunction is called for each key, and it uses new data and existing state of the key, to generate an updated state. However, based on community feedback, we have learnt the following lessons.

Need for more optimized state management that does not scan every key
Need to make it easier to implement common use cases - (a) timeout of idle data, (b) returning items other than state

The high level idea that I am proposing is

Introduce a new API ~~trackStateByKey~~ mapWithState that, allows the user to update per-key state, and emit arbitrary records. The new API is necessary as this will have significantly different semantics than the existing updateStateByKey API. This API will have direct support for timeouts.
Internally, the system will keep the state data as a map/list within the partitions of the state RDDs. The new data RDDs will be partitioned appropriately, and for all the key-value data, it will lookup the map/list in the state RDD partition and create a new list/map of updated state data. The new state RDD partition will be created based on the update data and if necessary, with old data.

Here is the detailed design doc (outdated, to be updated). Please take a look and provide feedback as comments.
https://docs.google.com/document/d/1NoALLyd83zGs1hNGMm0Pc5YOVgiPpMHugGMk6COqxxE/edit#heading=h.ph3w0clkd4em

Attachments

Issue Links

links to

[Github] Pull Request #9256 (tdas)

[Github] Pull Request #23223 (Ngone51)

Activity

People

Assignee:: Tathagata Das

Reporter:: Tathagata Das

Votes:: 10 Vote for this issue

Watchers:: 27 Start watching this issue

Dates

Created:: 22/Jul/14 21:31

Updated:: 05/Dec/18 14:51

Resolved:: 04/May/16 21:33