Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24935

Problem with Executing Hive UDF's from Spark 2.2 Onwards

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0, 2.3.1
    • 2.3.4, 2.4.3, 3.0.0
    • SQL
    • None

    Description

      A user of sketches library(https://github.com/DataSketches/sketches-hive) reported an issue with HLL Sketch Hive UDAF that seems to be a bug in Spark or Hive. Their code runs fine in 2.1 but has an issue from 2.2 onwards. For more details on the issue, you can refer to the discussion in the sketches-user list:
      https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/sketches-user/GmH4-OlHP9g/MW-J7Hg4BwAJ

       

      On further debugging, we figured out that from 2.2 onwards, Spark hive UDAF provides support for partial aggregation, and has removed the functionality that supported complete mode aggregation(Refer https://issues.apache.org/jira/browse/SPARK-19060 and https://issues.apache.org/jira/browse/SPARK-18186). Thus, instead of expecting update method to be called, merge method is called here (https://github.com/DataSketches/sketches-hive/blob/master/src/main/java/com/yahoo/sketches/hive/hll/SketchEvaluator.java#L56) which throws the exception as described in the forums above.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            pgandhi Parth Gandhi
            pgandhi Parth Gandhi
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment