Affects Version/s: 1.4.2
Fix Version/s: 1.4.3
Attempt to add an extensible validation framework to Sqoop. Adds an optional CLI option: --validate
There are 3 basic interfaces:
ValidationThreshold - Determines if the error margin between the source and target are acceptable: Absolute, Percentage Tolerant, etc.
Default implementation is AbsoluteValidationThreshold which ensures the row counts from source and targets are the same.
ValidationFailureHandler - Responsible for handling failures: log an error/warning, abort, etc. Default implementation logs a warning message to the configured logger.
Validator - Drives the validation logic by delegating the decision to ValidationThreshold and delegating failure handling to ValidationFailureHandler. The default implementation comes with a RowCountValidator which validates the row counts from source and the target.
You could extend these interfaces for more specific implementations and override 'em in sqoop configuration that is picked up.