Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15528

conv function returns inconsistent result for the same data

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.1
    • Fix Version/s: 1.6.2, 2.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      When using F.conv to convert a column from a hexadecimal string to an integer, the results are inconsistent

      val col = F.conv(df("some_col"), 16, 10)
      val a = df.select(F.countDistinct("some_col"), F.countDistinct(col)).collect()
      val b = df.select(F.countDistinct("some_col"), F.countDistinct(col)).collect()

      returns:
      a: Array[org.apache.spark.sql.Row] = Array([59776,1941936])
      b: Array[org.apache.spark.sql.Row] = Array([59776,1965154])

      P.S.
      "some_col" is a md5 hash of some string column calculated using F.md5

        Attachments

          Activity

            People

            • Assignee:
              maropu Takeshi Yamamuro
              Reporter:
              lioron Lior Regev
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: