Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36836

"sha2" expression with bit_length of 224 returns incorrect results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0, 3.0.0, 3.1.0, 3.2.0
    • 3.2.0
    • SQL
    • None

    Description

      sha2(input, bit_length) returns incorrect results when bit_length == 224.

       

      This bug seems to have been present since the sha2 expression was introduced in 1.5.0.

       

      Repro in spark shell:

      spark.sql("SELECT sha2('abc', 224)").show()

       

      Spark currently returns a garbled string, consisting of invalid UTF:

       #\t}"4�"�B�w��U�*��你���l��

      The expected return value is: 

      23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7

       

      This appears to happen because the  MessageDigest.digest() function appears to return bytes intended to be interpreted as a BigInt rather than a string. Thus, the output of MessageDigest.digest() must first be interpreted as a BigInt and then transformed into a hex string. 

      Attachments

        Activity

          People

            richardc-db Richard Chen
            richardc-db Richard Chen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: