Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-2169

Spark nary cbind/rbind with broadcasts

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      The introduction of nary cbind and rbinds in SYSTEMML-1986 added support for operations like E = cbind(A,B,C,D) which concatenates the matrices A, B, C, D column-wise without the need for intermediates as requires by traditional binary cbind operations (cbind(cbind(cbind(A,B),C),D)). SystemML also provides rewrites to automatically collapse chains of cbind or rbind operations into their nary counter-parts.

      However, for distributed spark operations, the binary cbind is still much better optimized than the nary operation, which only provides a general case operation based on repartition joins.

      This tasks aims to address this by extending BuiltinNarySPInstruction at runtime level (i.e., within processInstruction). Given the unlimited number of inputs, this runtime approach seems more appropriate than dedicated physical operators at compiler level. In detail, we need to evaluate if a subset of input fits into the broadcast budget, and if so provide alternative code path for nary cbind/rbind operations with broadcast joins.

      Note that distributed codegen operations have a similar characteristics of unlimited inputs and already leverage broadcast variables when possible. Hence, we can probably use a similar approach as done in SpoofSPInstruction.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mboehm7 Matthias Boehm
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: