Details
-
Task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Skipping corrupted rows in Sqoop
What is the proposed strategy for handling such scenarios in batch transfer?
Probably one of the below ..
1. Skip/ignore and still continue for good records
2. just bail out once we have a bad record?
3. have a threshold of how many bad rows we can tolerate? that is configurable.
From Anand Iyer
Sqoop is the most obvious place for the functionality discussed in this thread. But at some point, we should start think about adding ... functionality such as (Policy Driven SLAs and Data Validation) ....
This means we want to be able to define not just failure handling, but more elaborate strategies for sqoop data validation, metrics exposing the state of transfer etc.