Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4586

Values of non-deterministic UDFs are cached in backend

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Backend
    • Labels:

      Description

      This increases the severity of a pre-existing problem, where UDFs are always assumed to be deterministics, so UDFs with only constant arguments were cached or constant folded. In most cases in Impala 2.7.0, this had no effect, e.g. both f and g in f() + g() were re-evaluated for each input row.

      The below commit added caching of constant arguments to ScalarFnCall expressions (used for UDFs, builtin functions and various operators), so f() and g() would not be re-evaluated for each input row.

        commit 10fa472fa6aa036be02748ae54daed1722449c68
        Author: Tim Armstrong <tarmstrong@cloudera.com>
        Date:   Wed Oct 26 10:55:23 2016 -0700
      
            IMPALA-4302,IMPALA-2379: constant expr arg fixes
      

      The ideal solution is to provide syntax for UDF declarations that specifies whether it is deterministic. As a short-term workaround we could add a query option that assumes that all UDFs are non-deterministic.

        Attachments

          Activity

            People

            • Assignee:
              tarmstrong Tim Armstrong
              Reporter:
              tarmstrong Tim Armstrong
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: