[SPARK-3660] Initial RDD for updateStateByKey transformation - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.3.0
Component/s: DStreams
Labels:
None

Description

How to initialize state tranformation updateStateByKey?

I have word counts from previous spark-submit run, and want to load that in next spark-submit job to start counting over that.

One proposal is to add following argument to updateStateByKey methods.
initial : Option [RDD [(K, S)]] = None

This will maintain the backward compatibility as well.

I have a working code as well.

This thread started on spark-user list at:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-initialize-updateStateByKey-operation-td14772.html

Attachments

Issue Links

Add Link

links to

[Github] Pull Request #2665 (soumitrak)

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Unassigned

Reporter:: Soumitra Kumar

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 23/Sep/14 20:16

Updated:: 12/Nov/14 20:26

Resolved:: 12/Nov/14 20:26

Time Tracking

Estimated:

24h

Remaining:

24h

Logged:

Initial RDD for updateStateByKey transformation

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Time Tracking

Agile

Slack

Issue deployment