Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1016

Ability to access DistributedCache from UDFs

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None
    • Tags:
      UDF DistributedCache

      Description

      There have been several requests on the mailing list for
      information about how to access the DistributedCache from UDFs, e.g.:

      http://www.mail-archive.com/hive-user@hadoop.apache.org/msg01650.html
      http://www.mail-archive.com/hive-user@hadoop.apache.org/msg01926.html

      While responses to these emails suggested several workarounds, the only correct
      way of accessing the distributed cache is via the static methods of Hadoop's
      DistributedCache class, and all of these methods require that the JobConf be passed
      in as a parameter. Hence, giving UDFs access to the distributed cache
      reduces to giving UDFs access to the JobConf.

      I propose the following changes to GenericUDF/UDAF/UDTF:

      • Add an exec_init(Configuration conf) method that is called during Operator initialization at runtime.
      • Change the name of the "initialize" method to "compile_init" to make it clear that this method is called at compile-time.

        Attachments

        1. HIVE-1016.1.patch.txt
          9 kB
          Carl Steinbach
        2. HIVE-1016.r1471197.patch.txt
          9 kB
          Nicolas Lalevée

          Issue Links

            Activity

              People

              • Assignee:
                cwsteinbach Carl Steinbach
                Reporter:
                cwsteinbach Carl Steinbach
              • Votes:
                3 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: