[ZEPPELIN-1411] UDF with pyspark not working - object has no attribute 'parseDataType' - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.6.1
Fix Version/s: 0.6.2, 0.7.0
Component/s: Interpreters
Labels:
None

Description

The following UDF example doesn't work.

from pyspark.sql.types import StringType
from pyspark.sql.functions import udf

maturity_udf = udf(lambda age: "adult" if age >=18 else "child", StringType())  ## Error is from here.

df = sqlContext.createDataFrame([{'name': 'Alice', 'age': 1}])
df.withColumn("maturity", maturity_udf(df.age))

The error arises from

maturity_udf = udf(lambda age: "adult" if age >=18 else "child", StringType())

I tried several examples with UDF and they all result in the same stack trace.
Stack trace

Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-64075962331083004.py", line 266, in <module>
    raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-64075962331083004.py", line 259, in <module>
    exec(code)
  File "<stdin>", line 3, in <module>
  File "/home/sjames/zeppelin/zeppelin-0.6.1-bin-all/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/functions.py", line 1789, in udf
    return UserDefinedFunction(f, returnType)
  File "/home/sjames/zeppelin/zeppelin-0.6.1-bin-all/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/functions.py", line 1751, in __init__
    self._judf = self._create_judf(name)
  File "/home/sjames/zeppelin/zeppelin-0.6.1-bin-all/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/functions.py", line 1758, in _create_judf
    jdt = ctx._ssql_ctx.parseDataType(self.returnType.json())
AttributeError: 'JavaMember' object has no attribute 'parseDataType'

Similar error is also reported in https://forums.aws.amazon.com/thread.jspa?messageID=739815&tstart=0

Attachments

Issue Links

blocks

ZEPPELIN-1347 Release 0.6.2

Resolved

is duplicated by

ZEPPELIN-1442 UDF can not be found due to 2 instances of SparkSession is created

Resolved

relates to

ZEPPELIN-1453 Spark Interpreter Isolation "scoped" - Classloading Issues

Resolved

links to

GitHub Pull Request #1404

Activity

People

Assignee:: Jeff Zhang

Reporter:: Sojan James

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 05/Sep/16 09:44

Updated:: 21/Sep/16 15:16

Resolved:: 21/Sep/16 15:16