[SPARK-12235] Enhance mutate() to support replace existing columns - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.5.2
Fix Version/s: 2.0.0
Component/s: SparkR
Labels:
None

Description

mutate() in the dplyr package supports adding new columns and replacing existing columns. But currently the implementation of mutate() in SparkR supports adding new columns only.

Also make the behavior of mutate more consistent with that in dplyr.
1. Throw error message when there are duplicated column names in the DataFrame being mutated.
2. when there are duplicated column names in specified columns by arguments, the last column of the same name takes effect.

Attachments

Issue Links

duplicates

SPARK-10346 SparkR mutate and transform should replace column with same name to match R data.frame behavior

Resolved

is duplicated by

SPARK-10346 SparkR mutate and transform should replace column with same name to match R data.frame behavior

Resolved

is related to

SPARK-12225 Support adding or replacing multiple columns at once in DataFrame API

Resolved

links to

[Github] Pull Request #10220 (sun-rui)

Activity

People

Assignee:: Sun Rui

Reporter:: Sun Rui

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 09/Dec/15 06:59

Updated:: 28/Apr/16 16:39

Resolved:: 28/Apr/16 16:34