Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27591

A bug in UnivocityParser prevents using UDT

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete CommentsDelete
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.4.2
    • 3.0.0
    • SQL
    • None

    Description

      I am trying to define a UserDefinedType based on String but different from StringType in Spark 2.4.1 but it looks like there is a bug in Spark or I am doing smth incorrectly.

      I define my type as follows:

      class MyType extends UserDefinedType[MyValue] {
        override def sqlType: DataType = StringType
        ...
      }
      
      @SQLUserDefinedType(udt = classOf[MyType])
      case class MyValue
      

      I expect it to be read and stored as String with just a custom SQL type. In fact Spark can't read the string at all:

      java.lang.ClassCastException: org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$makeConverter$11 cannot be cast to org.apache.spark.unsafe.types.UTF8String
          at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
          at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getUTF8String(rows.scala:195)
          at org.apache.spark.sql.catalyst.expressions.JoinedRow.getUTF8String(JoinedRow.scala:102)
      

      the problem is with UnivocityParser.makeConverter that doesn't return (String => Any) function but (String => (String => Any)) in the case of UDT, see UnivocityParser:184

      case udt: UserDefinedType[_] => (datum: String) =>
        makeConverter(name, udt.sqlType, nullable, options)
      
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            drKalko Artem Kalchenko Assign to me
            drKalko Artem Kalchenko
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment