Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17866

Dataset.dropDuplicates (i.e., distinct) should not change the output of child plan

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • 2.1.0
    • SQL
    • None

    Description

      We create new Alias with new exprId in Dataset.dropDuplicates now. However it causes problem when we want to select the columns as follows:

      val ds = Seq(("a", 1), ("a", 2), ("b", 1), ("a", 1)).toDS()
      // ds("_2") will cause analysis exception
      ds.dropDuplicates("_1").select(ds("_1").as[String], ds("_2").as[Int])
      

      Attachments

        Issue Links

          Activity

            People

              viirya L. C. Hsieh
              viirya L. C. Hsieh
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: