Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8868

SqlSerializer2 can go into infinite loop when row consists only of NullType columns

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4.0, 1.5.0
    • Fix Version/s: 1.4.1, 1.5.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      The following SQL query will cause an infinite loop in SqlSerializer2 code:

      val df = sqlContext.sql("select null where 1 = 1")
      df.unionAll(df).sort("_c0").collect()
      

      The same problem occurs if we add more null-literals, but does not occur as long as there is a column of any other type (e.g. select 1, null where 1 == 1).

      I think that what's happening here is that if you have a row that consists only of columns of NullType (not columns of other types which happen to only contain null values, but only columns of null literals), SqlSerializer will end up writing / reading no data for rows of this type. Since the deserialization stream will never try to read any data but nevertheless will be able to return an empty row, DeserializationStream.asIterator will go into an infinite loop since there will never be a read to trigger an EOF exception.

        Attachments

          Activity

            People

            • Assignee:
              yhuai Yin Huai
              Reporter:
              joshrosen Josh Rosen
              Shepherd:
              Josh Rosen
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: