Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25302

ReducedWindowedDStream not using checkpoints for reduced RDDs

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1
    • Fix Version/s: None
    • Component/s: DStreams
    • Flags:
      Important

      Description

      When using reduceByKeyAndWindow() using inverse reduce function, it eventually creates a ReducedWindowedDStream. This class creates a reducedDStream but only persists it and does not checkpoint it. The result is that it ends up using cached RDDs and does not cut lineage to the input DStream resulting in eventually caching the input RDDs for much longer than they are needed. 

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              nikunj Nikunj Bansal
              Shepherd:
              Tathagata Das
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: