Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
From Tom White:
I;m having a problem loading data from multiple paths in Pig. What I'm trying to do is to load data from a range of dates, so I would like to specify an input of two globbed paths:
x = LOAD '2008/05/
{26,27,28,29,30,31},2008/06/{1,2}'Pig doesn't seem to like this though as it's trying to interpret it as a single path. The best I can do it to use UNION:
x1 = LOAD '2008/05/{26,27,28,29,30,31}
'
x2 = LOAD '2008/06/
'
x = UNION x1, x2
The downside to this is that I want to parameterize my paths, and having separate script for each number of paths in the input is cumbersome.
Is there a better way of doing this? Are there any plans to support multiple paths, and/or PathFilters?
Attachments
Issue Links
- depends upon
-
PIG-279 Local Pig does not parse globs
- Resolved
-
HADOOP-3498 File globbing alternation should be able to span path components
- Closed
- is cloned by
-
PIG-569 Inconsistency with Hadoop in Pig load statements involving globs with subdirectories
- Closed