Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.6.1
-
None
Description
The following UDF example doesn't work.
from pyspark.sql.types import StringType from pyspark.sql.functions import udf maturity_udf = udf(lambda age: "adult" if age >=18 else "child", StringType()) ## Error is from here. df = sqlContext.createDataFrame([{'name': 'Alice', 'age': 1}]) df.withColumn("maturity", maturity_udf(df.age))
The error arises from
maturity_udf = udf(lambda age: "adult" if age >=18 else "child", StringType())
I tried several examples with UDF and they all result in the same stack trace.
Stack trace
Traceback (most recent call last): File "/tmp/zeppelin_pyspark-64075962331083004.py", line 266, in <module> raise Exception(traceback.format_exc()) Exception: Traceback (most recent call last): File "/tmp/zeppelin_pyspark-64075962331083004.py", line 259, in <module> exec(code) File "<stdin>", line 3, in <module> File "/home/sjames/zeppelin/zeppelin-0.6.1-bin-all/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/functions.py", line 1789, in udf return UserDefinedFunction(f, returnType) File "/home/sjames/zeppelin/zeppelin-0.6.1-bin-all/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/functions.py", line 1751, in __init__ self._judf = self._create_judf(name) File "/home/sjames/zeppelin/zeppelin-0.6.1-bin-all/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/functions.py", line 1758, in _create_judf jdt = ctx._ssql_ctx.parseDataType(self.returnType.json()) AttributeError: 'JavaMember' object has no attribute 'parseDataType'
Similar error is also reported in https://forums.aws.amazon.com/thread.jspa?messageID=739815&tstart=0
Attachments
Issue Links
- blocks
-
ZEPPELIN-1347 Release 0.6.2
- Resolved
- is duplicated by
-
ZEPPELIN-1442 UDF can not be found due to 2 instances of SparkSession is created
- Resolved
- relates to
-
ZEPPELIN-1453 Spark Interpreter Isolation "scoped" - Classloading Issues
- Resolved
- links to