Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2382

Impala unable to read Java based UDFs that return a standard datatype like Float or String(This works in hive)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.2.4, Impala 2.3.0
    • Impala 2.5.0
    • Frontend

    Description

      Impala is unable to read Java based UDFs that return a standard java.lang primitive datatype like Float or String. As an example, the text mining UDF @ https://github.com/rueedlinger/hive-udf/ has a function 'Distance' that is used to calculate Levenshtein distance and returns a Float data type.

      [xxx.yyy.zzz.com:21000] > CREATE FUNCTION levdistance(STRING,STRING,STRING) RETURNS FLOAT LOCATION 'hdfs://datalab-uatb/project/p11/inc/private/jars/hive-udf-textmining-1.0-SNAPSHOT-jar-with-dependencies.jar' SYMBOL='ch.yax.hive.udf.text.Distance';
      Query: create FUNCTION levdistance(STRING,STRING,STRING) RETURNS FLOAT LOCATION 'hdfs://datalab-uatb/project/p11/inc/private/jars/hive-udf-textmining-1.0-SNAPSHOT-jar-with-dependencies.jar' SYMBOL='ch.yax.hive.udf.text.Distance'

      Fetched 0 row(s) in 0.77s

      [xxx.yyy.zzz.com:21000] > select levdistance('L','test','testing');
      Query: select levdistance('L','test','testing')
      ---------------------------------------------

      default.levdistance('l', 'test', 'testing')

      ---------------------------------------------

      NULL

      ---------------------------------------------
      WARNINGS: UDF WARNING: Hive UDF path=hdfs://datalab-uatb/project/p11/inc/private/jars/hive-udf-textmining-1.0-SNAPSHOT-jar-with-dependencies.jar class=ch.yax.hive.udf.text.Distance failed due to: ImpalaRuntimeException: UDF::evaluate() ran into a problem.
      CAUSED BY: ClassCastException: java.lang.Float cannot be cast to org.apache.hadoop.io.FloatWritable

      The only way to get this working is to re-create the UDF using the Hadoop Writables data type. This is obviously not ideal as the process of creating the UDF is time consuming and needs extra resources whereas it should have worked in the first place like the way it works in Hive.

      Attachments

        Activity

          People

            dtsirogiannis Dimitris Tsirogiannis
            mala_ck Mala Chikka Kempanna
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: