[SPARK-38812] when i clean data ,I hope one rdd spill two rdd according clean data rule - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: 3.2.1
Fix Version/s: None
Component/s: Spark Core
Labels:
None

Description

when id do clean data,one rdd according one value(>or <) filter data, and then generate two different set，one is error data file， another is errorless data file.

Now I use filter, but this filter must have two spark dag job, that cost too much.

exactly some code like iterator.span(preidicate) and then return one tuple(iter1,iter2)

one dataset will be spilted tow dataset in one rule data clean progress.

i hope compute once not twice.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: gaokui

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 07/Apr/22 01:17

Updated:: 14/Jun/22 00:34

Resolved:: 14/Jun/22 00:34