Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33641

Invalidate new char-like type in public APIs that result incorrect results

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.1.0
    • Fix Version/s: 3.1.0
    • Component/s: SQL
    • Labels:
      None

      Description

      1. udf

      scala> spark.udf.register("abcd", () => "12345", org.apache.spark.sql.types.VarcharType(2))
      
      scala> spark.sql("select abcd()").show
      scala.MatchError: CharType(2) (of class org.apache.spark.sql.types.VarcharType)
        at org.apache.spark.sql.catalyst.encoders.RowEncoder$.externalDataTypeFor(RowEncoder.scala:215)
        at org.apache.spark.sql.catalyst.encoders.RowEncoder$.externalDataTypeForInput(RowEncoder.scala:212)
        at org.apache.spark.sql.catalyst.expressions.objects.ValidateExternalType.<init>(objects.scala:1741)
        at org.apache.spark.sql.catalyst.encoders.RowEncoder$.$anonfun$serializerFor$3(RowEncoder.scala:175)
        at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
        at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
        at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
        at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
        at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:198)
        at org.apache.spark.sql.catalyst.encoders.RowEncoder$.serializerFor(RowEncoder.scala:171)
        at org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:66)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:768)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
        at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:611)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:768)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:606)
        ... 47 elided
      

      2. spark.createDataframe

      scala> spark.createDataFrame(spark.read.text("README.md").rdd, new org.apache.spark.sql.types.StructType().add("c", "char(1)")).show
      +--------------------+
      |                   c|
      +--------------------+
      |      # Apache Spark|
      |                    |
      |Spark is a unifie...|
      |high-level APIs i...|
      |supports general ...|
      |rich set of highe...|
      |MLlib for machine...|
      |and Structured St...|
      |                    |
      |<https://spark.ap...|
      |                    |
      |[![Jenkins Build]...|
      |[![AppVeyor Build...|
      |[![PySpark Covera...|
      |                    |
      |                    |
      |## Online Documen...|
      |                    |
      |You can find the ...|
      |guide, on the [pr...|
      +--------------------+
      only showing top 20 rows
      

      3. reader.schema

      ```
      scala> spark.read.schema("a varchar(2)").text("./README.md").show(100)
      --------------------

      a

      --------------------

      1. Apache Spark
       
      Spark is a unifie...
      high-level APIs i...
      supports general ...

      ```
      4. etc

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Qin Yao Kent Yao
                Reporter:
                Qin Yao Kent Yao
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: