Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-756

UDFs should have API for transparently opening and reading files from HDFS or from local file system with only relative path

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.7.0
    • None
    • None

    Description

      I have a utility function util.INSETFROMFILE() that I pass a file name during initialization.

      define inQuerySet util.INSETFROMFILE(analysis/queries);
      A = load 'logs' using PigStorage() as ( date int, query chararray );
      B = filter A by inQuerySet(query);
      

      This provides a computationally inexpensive way to effect map-side joins for small sets plus functions of this style provide the ability to encapsulate more complex matching rules.

      For rapid development and debugging purposes, I want this code to run without modification on both my local file system when I do pig -exectype local and on HDFS.

      Pig needs to provide an API for UDFs which allow them to either:

      1) "know" when they are in local or HDFS mode and let them open and read from files as appropriate
      2) just provide a file name and read statements and have pig transparently manage local or HDFS opens and reads for the UDF

      UDFs need to read configuration information off the filesystem and it simplifies the process if one can just flip the switch of -exectype local.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ciemo David Ciemiewicz
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: