Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Not A Problem
    • Affects Version/s: 2.0.0
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:
      None
    • Environment:

      Databricks Cloud / Spark 2.0.0

      Description

      Background

      Llonger running processes that might run analytics or contact external services from UDFs. The response might not just be a field, but instead a structure of information. When attempting to break out this information, it is critical that query is optimized correctly.

      Steps to Reproduce

      1. Create some sample data.
      2. Create a UDF that returns a multiple attributes.
      3. Run UDF over some data.
      4. Create new columns from the multiple attributes.
      5. Observe run time.

      Actual Results

      The UDF is executed multiple times per row.

      Expected Results

      The UDF should only be executed once per row.

      Workaround

      Cache the Dataset after UDF execution.

      Details

      For code and more details, see over_optimized_udf.html

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jeisinge Jacob Eisinger
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: