Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15528

conv function returns inconsistent result for the same data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.1
    • 1.6.2, 2.0.0
    • SQL
    • None

    Description

      When using F.conv to convert a column from a hexadecimal string to an integer, the results are inconsistent

      val col = F.conv(df("some_col"), 16, 10)
      val a = df.select(F.countDistinct("some_col"), F.countDistinct(col)).collect()
      val b = df.select(F.countDistinct("some_col"), F.countDistinct(col)).collect()

      returns:
      a: Array[org.apache.spark.sql.Row] = Array([59776,1941936])
      b: Array[org.apache.spark.sql.Row] = Array([59776,1965154])

      P.S.
      "some_col" is a md5 hash of some string column calculated using F.md5

      Attachments

        Activity

          People

            maropu Takeshi Yamamuro
            lioron Lior Regev
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: