Description
When using F.conv to convert a column from a hexadecimal string to an integer, the results are inconsistent
val col = F.conv(df("some_col"), 16, 10)
val a = df.select(F.countDistinct("some_col"), F.countDistinct(col)).collect()
val b = df.select(F.countDistinct("some_col"), F.countDistinct(col)).collect()
returns:
a: Array[org.apache.spark.sql.Row] = Array([59776,1941936])
b: Array[org.apache.spark.sql.Row] = Array([59776,1965154])
P.S.
"some_col" is a md5 hash of some string column calculated using F.md5