Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Bloom filters themselves could become really big if the row count is high. Aggregating such bloom filters in reducers could be even more expensive. For e.g., a bloom filter for 100M rows can be as big as 170MB. Aggregating 100 such filters in reducer could end up taking 17GB of memory.