Description
sha2(input, bit_length) returns incorrect results when bit_length == 224.
This bug seems to have been present since the sha2 expression was introduced in 1.5.0.
Repro in spark shell:
spark.sql("SELECT sha2('abc', 224)").show()
Spark currently returns a garbled string, consisting of invalid UTF:
#\t}"4�"�B�w��U�*��你���l��
The expected return value is:
23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7
This appears to happen because the MessageDigest.digest() function appears to return bytes intended to be interpreted as a BigInt rather than a string. Thus, the output of MessageDigest.digest() must first be interpreted as a BigInt and then transformed into a hex string.