Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7292

Provide operator to truncate lineage without persisting RDD's

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • Spark Core
    • None

    Description

      Checkpointing exists in Spark to truncate a lineage chain. I've heard requests from some users to allow truncation of lineage in a way that is "cheap" and doesn't serialized and persist the RDD. This is possible if the user is willing to forgo fault tolerance for that RDD (for instance, for shorter running jobs or ones that use a small number of machines). It's pretty easy to allow this so we should look into it for Spark 1.5.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            andrewor14 Andrew Or
            pwendell Patrick Wendell
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment