Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32341

add mutiple filter in rdd function

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Won't Do
    • 2.4.6, 3.0.0
    • None
    • Spark Core
    • None

    Description

      when i use spark rdd . i often use to read kafka data.And kafka data has lots of kinds data set.

      I filter these rdd  by kafka key , then i can use Array[rdd] to fill every topic rdd. 

      But at that ,  i use rdd.filter,that  will generate more than one stage.Data will process by many task, that consume too many time. And it is not necessary.

      i hope add multiple  filter function not rdd.filter ,that will return Array[RDD] in one stage by dividing all  mixture data  RDD to single data set RDD .

      function like Array[RDD]=rdd.multiplefilter(setcondition).

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            sungaok gaokui
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: