[PIG-1891] Enable StoreFunc to make intelligent decision based on job success or failure - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 0.10.0
Fix Version/s: 0.11
Component/s: None
Labels:
- patch

Release Note:

Hide
This adds a new method, cleanupOnSuccess, to the StoreFunc interface, and thus will cause backward compatibility issues for users who directly implement this interface. Most store functions implement StoreFuncImpl, which will shield them from this issue as it implements the new method.

Show
This adds a new method, cleanupOnSuccess, to the StoreFunc interface, and thus will cause backward compatibility issues for users who directly implement this interface. Most store functions implement StoreFuncImpl, which will shield them from this issue as it implements the new method.

Description

We are in the process of using PIG for various data processing and component integration. Here is where we feel pig storage funcs lack:

They are not aware if the over all job has succeeded. This creates a problem for storage funcs which needs to "upload" results into another system:

DB, FTP, another file system etc.

I looked at the DBStorage in the piggybank (http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java?view=markup) and what I see is essentially a mechanism which for each task does the following:

1. Creates a recordwriter (in this case open connection to db)
2. Open transaction.
3. Writes records into a batch
4. Executes commit or rollback depending if the task was successful.

While this aproach works great on a task level, it does not work at all on a job level.

If certain tasks will succeed but over job will fail, partial records are going to get uploaded into the DB.

Any ideas on the workaround?

Our current workaround is fairly ugly: We created a java wrapper that launches pig jobs and then uploads to DB's once pig's job is successful. While the approach works, it's not really integrated into pig.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PIG-1891-3.patch
01/Sep/12 18:44
10 kB
Eli Reisman
PIG-1891-2.patch
15/Aug/12 01:03
9 kB
Eli Reisman
PIG-1891-1.patch
21/Jul/12 00:51
9 kB
Eli Reisman

Issue Links

is related to

PIG-2935 Catch NoSuchMethodError when StoreFuncInterface's new cleanupOnSuccess method isn't implemented.

Closed

PIG-3770 Enhance DBStorage to make it more flexible (should batch statements?, rollback on job failure, support command line arguments etc.)

Open

Activity

People

Assignee:: Eli Reisman

Reporter:: Alex Rovner

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 09/Mar/11 23:20

Updated:: 18/Feb/14 23:22

Resolved:: 07/Sep/12 21:30