Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2107

When using pig with HBaseStorage, pig filters should utilize hbase indexes to limit workset.

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      The LOAD function using HBaseStorage has filter arguments you can use limit the working set for an MR job.
      e.g.
      blah = LOAD 'hbase://test' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:field1', '-loadKey -gte foo1 -lte foo1');

      It would be really great if this could also be applied to filter statements within pig, where a filter statement within pig e.g.
      blah2 = FILTER blah by key=foo1; or
      blah2 = FILTER blah by key > foo1 and key < foo2;

      would actually limit what is retrieved from hbase, so big has a smaller working set to perform MR on.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              asunwoo Albert Sunwoo
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: