Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1016

Ability to access DistributedCache from UDFs

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • Query Processor
    • None
    • UDF DistributedCache

    Description

      There have been several requests on the mailing list for
      information about how to access the DistributedCache from UDFs, e.g.:

      http://www.mail-archive.com/hive-user@hadoop.apache.org/msg01650.html
      http://www.mail-archive.com/hive-user@hadoop.apache.org/msg01926.html

      While responses to these emails suggested several workarounds, the only correct
      way of accessing the distributed cache is via the static methods of Hadoop's
      DistributedCache class, and all of these methods require that the JobConf be passed
      in as a parameter. Hence, giving UDFs access to the distributed cache
      reduces to giving UDFs access to the JobConf.

      I propose the following changes to GenericUDF/UDAF/UDTF:

      • Add an exec_init(Configuration conf) method that is called during Operator initialization at runtime.
      • Change the name of the "initialize" method to "compile_init" to make it clear that this method is called at compile-time.

      Attachments

        1. HIVE-1016.1.patch.txt
          9 kB
          Carl Steinbach
        2. HIVE-1016.r1471197.patch.txt
          9 kB
          Nicolas Lalevée

        Issue Links

          Activity

            People

              cwsteinbach Carl Steinbach
              cwsteinbach Carl Steinbach
              Votes:
              3 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: