Details
Description
Similar to http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html
def dropDuplicates(): DataFrame
def dropDuplicates(subset: Seq[String]): DataFrame
We can turn this into groupBy(cols).agg(first(...))
Attachments
Issue Links
- relates to
-
SPARK-12337 Implement dropDuplicates() method of DataFrame in SparkR
- Resolved
- links to