Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-1837

Unary aggregate w/ corrections output to large physical blocks

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • SystemML 0.15
    • None
    • None

    Description

      Many unary aggregate operations store corrections in additional columns or rows. For example, rowSums(X) uses a two-column output to store sums and corrections. In CP, we drop these corrections immediately after the operations, while in MR and Spark these corrections are dropped after final aggregation. The issue is that the MatrixBlock::dropLastRowsOrColums does not actually drop the correction but simply shifts all values in the right starting positions. Hence, the physical output is actually larger than what the memory estimates represent. This leads to unnecessary large memory consumption during subsequent operations and in the buffer pool, which can lead to OOMs. This task aims to fix MatrixBlock::dropLastRowsOrColums.

      In a subsequent task, we could also modify all unary aggregates to never allocate the multi-column/row output when executed in CP. However, this requires custom code paths for the different backends.

      Attachments

        Activity

          People

            mboehm7 Matthias Boehm
            mboehm7 Matthias Boehm
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: