Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31916

StringConcat can overflow `length`, leads to StringIndexOutOfBoundsException

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.4, 3.0.0
    • Fix Version/s: 3.0.1, 3.1.0
    • Component/s: SQL
    • Labels:
      None

      Description

      We have query plans that through multiple transformations can grow extremely long in length. These would eventually throw OutOfMemory exceptions (https://issues.apache.org/jira/browse/SPARK-26103 & related https://issues.apache.org/jira/browse/SPARK-25380).

       

      We backported the changes from https://github.com/apache/spark/pull/23169 into our distribution of Spark, based on 2.4.4, and attempted to use the added `spark.sql.maxPlanStringLength`. While this works in some cases, large query plans can still lead to issues stemming from `StringConcat` in sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala.

       

      The following unit test exhibits the issue, which continues to fail in the master branch of spark:

       

        test("StringConcat doesn't overflow on many inputs") {    
          val concat = new StringConcat(maxLength = 100)
          0.to(Integer.MAX_VALUE).foreach { _ =>      
            concat.append("hello world")    
           }    
          assert(concat.toString.length === 100)  
      } 
      

       

      Looking at the append method here: https://github.com/apache/spark/blob/fc6af9d900ec6f6a1cbe8f987857a69e6ef600d1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala#L118-L128

       

      It seems like regardless of whether the string to be append is added fully to the internal buffer, added as a substring to reach `maxLength`, or not added at all the internal `length` field is incremented by the length of `s`. Eventually this will overflow an int and cause L123 to substring with a negative index.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dkbiswal Dilip Biswal
                Reporter:
                jstokes Jeffrey Stokes
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: