Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1752

UDFs should be able to indicate files to load in the distributed cache

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 0.9.0
    • impl
    • None
    • Hide
      We add a new method to EvalFunc:
      public List<String> getCacheFiles();

      User can override this method to return a list of hdfs files need to shipped to distributed cache. Inside EvalFunc, user can assume these files are already exist in distributed cache.

      For example:
      public class Udfcachetest extends EvalFunc<String> {

          public String exec(Tuple input) throws IOException {
              FileReader fr = new FileReader("./smallfile");
              BufferedReader d = new BufferedReader(fr);
              return d.readLine();
          }

          public List<String> getCacheFiles() {
              List<String> list = new ArrayList<String>(1);
              list.add("/user/pig/tests/data/small#smallfile");
              return list;
          }
      }

      a = load '1.txt';
      b = foreach a generate Udfcachetest(*);
      dump b;
      Show
      We add a new method to EvalFunc: public List<String> getCacheFiles(); User can override this method to return a list of hdfs files need to shipped to distributed cache. Inside EvalFunc, user can assume these files are already exist in distributed cache. For example: public class Udfcachetest extends EvalFunc<String> {     public String exec(Tuple input) throws IOException {         FileReader fr = new FileReader("./smallfile");         BufferedReader d = new BufferedReader(fr);         return d.readLine();     }     public List<String> getCacheFiles() {         List<String> list = new ArrayList<String>(1);         list.add("/user/pig/tests/data/small#smallfile");         return list;     } } a = load '1.txt'; b = foreach a generate Udfcachetest(*); dump b;

    Description

      Currently there is no way for a UDF to load a file into the distributed cache.

      Attachments

        1. PIG-1752.patch
          7 kB
          Alan Gates

        Issue Links

          Activity

            People

              gates Alan Gates
              gates Alan Gates
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: