Details
-
New Feature
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
None
-
None
-
None
-
Patch Available
Description
Adding three additional server side filtering options when loading data with HBaseStorage:
- specified cf:col does not exist
-null cf:col - specified cf:col must exist
-notnull cf:col - specified cf:col contains the given value
-val cf:col=value
These are meant to replace (and optimize by reducing data transfer) the frequent paradigm in pig of loading data and immediately filtering for a specific condition. For example
data = load 'hbase://mytable' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*') as (cf:map[]) ;
data_with_value = filter data by cf#'col' = 'value' ;
Can be replaced with:
data_with_value = load 'hbase://mytable' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*', '-val cf:col=value') as (cf:map[]) ;