Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15807

Support varargs for dropDuplicates in Dataset/DataFrame

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0
    • SQL
    • None

    Description

      This issue adds `varargs`-types `dropDuplicates` functions in `Dataset/DataFrame`. Currently, `dropDuplicates` supports only `Seq` or `Array`.

      scala> val ds = spark.createDataFrame(Seq(("a", 1), ("b", 2), ("a", 2)))
      ds: org.apache.spark.sql.DataFrame = [_1: string, _2: int]
      
      scala> ds.dropDuplicates(Seq("_1", "_2"))
      res0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: string, _2: int]
      
      scala> ds.dropDuplicates("_1", "_2")
      <console>:26: error: overloaded method value dropDuplicates with alternatives:
        (colNames: Array[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] <and>
        (colNames: Seq[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] <and>
        ()org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
       cannot be applied to (String, String)
             ds.dropDuplicates("_1", "_2")
                ^
      

      Attachments

        Issue Links

          Activity

            People

              dongjoon Dongjoon Hyun
              dongjoon Dongjoon Hyun
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: