Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-1224

Select default buffer size for mini-batch preprocessor

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • v1.14
    • Module: Utilities
    • None

    Description

      As a follow up of https://issues.apache.org/jira/browse/MADLIB-1200

       

      In minibatch_preprocessor, we made buffer_size as an optional parameter. If it is not set, some default value will be assigned. Current considerations are:

      1. Within segment, each cell has 1GB limit so that we can't put too many rows into one super row to exceed the limit
      2. Among segments, data should be distributed as equally as possible to avoid data skew so that GPDB can work more efficiently. 

      Attachments

        Activity

          People

            jingyimei Jingyi Mei
            jingyimei Jingyi Mei
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: