Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3660

Initial RDD for updateStateByKey transformation

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0
    • DStreams
    • None

    Description

      How to initialize state tranformation updateStateByKey?

      I have word counts from previous spark-submit run, and want to load that in next spark-submit job to start counting over that.

      One proposal is to add following argument to updateStateByKey methods.
      initial : Option [RDD [(K, S)]] = None

      This will maintain the backward compatibility as well.

      I have a working code as well.

      This thread started on spark-user list at:
      http://apache-spark-user-list.1001560.n3.nabble.com/How-to-initialize-updateStateByKey-operation-td14772.html

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            soumitra Soumitra Kumar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified

                Slack

                  Issue deployment