Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
When executing a STORE command, Pig creates the output location before the storage function gets called. This causes problems with storage functions that have logic to determine the output location. See this thread:
http://www.mail-archive.com/pig-user%40hadoop.apache.org/msg01538.html
For example, when making a request like this:
STORE A INTO '/my/home/output' USING MultiStorage('/my/home/output','0', 'none', '\t');
Pig creates a file '/my/home/output' and then an exception is thrown when MultiStorage tries to make a directory under '/my/home/output'. The workaround is to instead specify a dummy location as the first path like so:
STORE A INTO '/my/home/output/temp' USING MultiStorage('/my/home/output','0', 'none', '\t');
Two changes should be made:
1. The path specified in the INTO clause should be available to the storage function so it doesn't need to be duplicated.
2. The creation of the output paths should be delegated to the storage function.
Attachments
Issue Links
- is part of
-
PIG-966 Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
-
- Closed
-