Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47307

Spark 3.3 produces invalid base64

    XMLWordPrintableJSON

Details

    Description

      SPARK-37820 was introduced in Spark 3.3 and breaks behavior of base64 (which is fine but shouldn't happen between minor version).

      Spark 3.2
      >>> spark.sql(f"""SELECT base64('{'a' * 58}') AS base64""").collect()[0][0]
      'YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYQ=='
      

      Note the different output in Spark 3.3 (the addition of \r\n newlines).

      Spark 3.3
      >>> spark.sql(f"""SELECT base64('{'a' * 58}') AS base64""").collect()[0][0]
      'YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh\r\nYQ=='
      

      The former decodes fine with the base64 on my machine but the latter does not:

      $ pbpaste | base64 --decode
      aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa%
      
      $ pbpaste | base64 --decode
      base64: stdin: (null): error decoding base64 input stream
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rshkv Willi Raschkowski
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: