Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Patch Available
-
Reviewed
Description
On Thu, Oct 15, 2015 at 4:06 AM, Saggi Neumann <saggi@xplenty.com> wrote:
You may also check these for ideas. It would be good to have them
implemented:
https://wiki.apache.org/pig/PigErrorHandlingInScripts
https://issues.apache.org/jira/browse/PIG-2620
–
Saggi Neumann
Co-founder and CTO, Xplenty
M: +972-544-546102
On Thu, Oct 15, 2015 at 12:17 AM, Siddhi Mehta <smehtauser@gmail.com> wrote:
> Hello Everyone,
>
> Just wanted to follow up on the my earlier post and see if there are any
> thoughts around the same.
> I was planning to take a stab to implement the same.
>
> The approach I was planning to use for the same is
> 1. Make the storer that wants error handling capability implement an
> interface(ErrorHandlingStoreFunc).
> 2. Using this interface the storer can define if the thresholds for
> error.Each store func can determine what the threshold should be.For
> example HbaseStorage can have a different threshold from ParquetStorage.
> 3. Whenever the storer gets created in
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc()
> we intercept the called and give it a wrappedStoreFunc
> 4. Every put next calls now gets delegated to the actual storer via the
> delegate and we can listen in for error on putNext() and take care of the
> allowing the error if within threshold or re throwing from there.
> 5. The client can get information about the threshold value from the
> counters to know if there was any data dropped.
>
> Thougts?
>
> Thanks,
> Siddhi
>
>
> On Mon, Oct 12, 2015 at 1:49 PM, Siddhi Mehta <smehtauser@gmail.com>
> wrote:
>
> > Hey Guys,
> >
> > Currently a Pig job fails when one record out of the billions records
> > fails on STORE.
> > This is not always desirable behavior when you are dealing with millions
> > of records and only few fail.
> > In certain use-cases its desirable to know how many such errors and have
> > an accounting for the same.
> > Is there a configurable limits that we can set for pig so that we can
> > allow a threshold for bad records on STORE similar to the lines of the
> JIRA
> > for LOAD PIG-3059 <https://issues.apache.org/jira/browse/PIG-3059>
> >
> > Thanks,
> > Siddhi
> >
>
Attachments
Attachments
Issue Links
- blocks
-
PIG-4719 Documentation for PIG-4704: Customizable Error Handling for Storers in Pig
- Closed