Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
As a follow up of https://issues.apache.org/jira/browse/MADLIB-1200
In minibatch_preprocessor, we made buffer_size as an optional parameter. If it is not set, some default value will be assigned. Current considerations are:
- Within segment, each cell has 1GB limit so that we can't put too many rows into one super row to exceed the limit
- Among segments, data should be distributed as equally as possible to avoid data skew so that GPDB can work more efficiently.