Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3660

Initial RDD for updateStateByKey transformation

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersStop watchingWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0
    • DStreams
    • None

    Description

      How to initialize state tranformation updateStateByKey?

      I have word counts from previous spark-submit run, and want to load that in next spark-submit job to start counting over that.

      One proposal is to add following argument to updateStateByKey methods.
      initial : Option [RDD [(K, S)]] = None

      This will maintain the backward compatibility as well.

      I have a working code as well.

      This thread started on spark-user list at:
      http://apache-spark-user-list.1001560.n3.nabble.com/How-to-initialize-updateStateByKey-operation-td14772.html

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            soumitra Soumitra Kumar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 24h
              24h
              Remaining:
              Remaining Estimate - 24h
              24h
              Logged:
              Time Spent - Not Specified
              Not Specified

              Slack

                Issue deployment