Resolution: Not A Problem
Affects Version/s: 2.0.0
Fix Version/s: None
Component/s: Spark Core
Databricks Cloud / Spark 2.0.0
Llonger running processes that might run analytics or contact external services from UDFs. The response might not just be a field, but instead a structure of information. When attempting to break out this information, it is critical that query is optimized correctly.
- Create some sample data.
- Create a UDF that returns a multiple attributes.
- Run UDF over some data.
- Create new columns from the multiple attributes.
- Observe run time.
The UDF is executed multiple times per row.
The UDF should only be executed once per row.
Cache the Dataset after UDF execution.
For code and more details, see over_optimized_udf.html