[PIG-48] LoadFunc API is too limiting - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.2.0
Component/s: None
Labels:
None

Description

Currently the LoadFunc API assumes that you are pulling data from a Hadoop filesystem and that PIG will have already found the file and split it. I would like a lower-level API that hands me the information so I can find the data and do the split. For instance, this is a very inconvenient way to load data from an RSS URL:

register /Users/samp/Projects/pigrss/out/getfeed-all.jar
define getFeed com.sampullara.pig.storage.GetFeed();
URL = LOAD 'url' using PigStorage() as (url);
A = FOREACH URL GENERATE FLATTEN(getFeed(url));

Where GetFeed is an EvalFunc because there was no way to do this as a LoadFunc. While we are at we could add the ability to create a literal Tuple in the PIG language

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Sam Pullara

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 07/Dec/07 17:35

Updated:: 25/Mar/10 00:12

Resolved:: 26/Jan/09 19:51