Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38812

when i clean data ,I hope one rdd spill two rdd according clean data rule

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 3.2.1
    • None
    • Spark Core
    • None

    Description

      when id do clean data,one rdd according one value(>or <) filter data, and then generate two different set,one is error data file, another is errorless data file.

      Now I use filter, but this filter must have two spark dag job, that cost too much.

      exactly some code like iterator.span(preidicate) and then return one tuple(iter1,iter2)

      one dataset will be spilted tow dataset in one rule data clean progress.

      i hope compute once not twice.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sungaok gaokui
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: