Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
/**
- Build a bloom filter for use later in Bloom. This UDF is intended to run
- in a group all job. For example:
- define bb BuildBloom('jenkins', '100', '0.1');
- A = load 'foo' as (x, y);
- B = group A all;
- C = foreach B generate BuildBloom(A.x);
- store C into 'mybloom';
- The bloom filter can be on multiple keys by passing more than one field
- (or the entire bag) to BuildBloom.
- The resulting file can then be used in a Bloom filter as:
- define bloom Bloom(mybloom);
- A = load 'foo' as (x, y);
- B = load 'bar' as (z);
- C = filter B by Bloom(z);
- D = join C by z, A by x;
- It uses
{@link org.apache.hadoop.util.bloom.BloomFilter}
.
*/
Pig script inside above doc strings doesn't work