[SPARK-12338] Support dropping duplicated rows on selected columns in DataFrame in R style - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 1.5.2
Fix Version/s: None
Component/s: SparkR
Labels:
- bulk-closed

Description

in R, unique() can drop duplicated rows on all columns. And something like

df[!duplicated(df[,c('x1', 'x2')]),]

is used to drop duplicated rows on selected columns. It's better that my can support duplicated(), and subsetting a DataFrame using the result of duplicated().

Attachments

Issue Links

relates to

SPARK-12337 Implement dropDuplicates() method of DataFrame in SparkR

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Sun Rui

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 15/Dec/15 11:29

Updated:: 21/May/19 04:32

Resolved:: 21/May/19 04:32