Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
I have a utility function util.INSETFROMFILE() that I pass a file name during initialization.
define inQuerySet util.INSETFROMFILE(analysis/queries); A = load 'logs' using PigStorage() as ( date int, query chararray ); B = filter A by inQuerySet(query);
This provides a computationally inexpensive way to effect map-side joins for small sets plus functions of this style provide the ability to encapsulate more complex matching rules.
For rapid development and debugging purposes, I want this code to run without modification on both my local file system when I do pig -exectype local and on HDFS.
Pig needs to provide an API for UDFs which allow them to either:
1) "know" when they are in local or HDFS mode and let them open and read from files as appropriate
2) just provide a file name and read statements and have pig transparently manage local or HDFS opens and reads for the UDF
UDFs need to read configuration information off the filesystem and it simplifies the process if one can just flip the switch of -exectype local.