Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3961

Adding HBaseStorage cell value filters

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • 0.15.0
    • None
    • None
    • Patch Available

    Description

      Adding three additional server side filtering options when loading data with HBaseStorage:

      1. specified cf:col does not exist
        -null cf:col
      2. specified cf:col must exist
        -notnull cf:col
      3. specified cf:col contains the given value
        -val cf:col=value

      These are meant to replace (and optimize by reducing data transfer) the frequent paradigm in pig of loading data and immediately filtering for a specific condition. For example

      data = load 'hbase://mytable' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*') as (cf:map[]) ;
      data_with_value = filter data by cf#'col' = 'value' ;

      Can be replaced with:

      data_with_value = load 'hbase://mytable' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*', '-val cf:col=value') as (cf:map[]) ;

      Attachments

        1. filters-patch.v2.diff
          12 kB
          Mike Welch

        Activity

          People

            mjwelch Mike Welch
            mjwelch Mike Welch
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: