Pig
  1. Pig
  2. PIG-1752

UDFs should be able to indicate files to load in the distributed cache

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: impl
    • Labels:
      None
    • Release Note:
      Hide
      We add a new method to EvalFunc:
      public List<String> getCacheFiles();

      User can override this method to return a list of hdfs files need to shipped to distributed cache. Inside EvalFunc, user can assume these files are already exist in distributed cache.

      For example:
      public class Udfcachetest extends EvalFunc<String> {

          public String exec(Tuple input) throws IOException {
              FileReader fr = new FileReader("./smallfile");
              BufferedReader d = new BufferedReader(fr);
              return d.readLine();
          }

          public List<String> getCacheFiles() {
              List<String> list = new ArrayList<String>(1);
              list.add("/user/pig/tests/data/small#smallfile");
              return list;
          }
      }

      a = load '1.txt';
      b = foreach a generate Udfcachetest(*);
      dump b;
      Show
      We add a new method to EvalFunc: public List<String> getCacheFiles(); User can override this method to return a list of hdfs files need to shipped to distributed cache. Inside EvalFunc, user can assume these files are already exist in distributed cache. For example: public class Udfcachetest extends EvalFunc<String> {     public String exec(Tuple input) throws IOException {         FileReader fr = new FileReader("./smallfile");         BufferedReader d = new BufferedReader(fr);         return d.readLine();     }     public List<String> getCacheFiles() {         List<String> list = new ArrayList<String>(1);         list.add("/user/pig/tests/data/small#smallfile");         return list;     } } a = load '1.txt'; b = foreach a generate Udfcachetest(*); dump b;

      Description

      Currently there is no way for a UDF to load a file into the distributed cache.

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Alan Gates
              Reporter:
              Alan Gates
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development