Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-3607

Split Balanced Internal

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • SystemDS 3.2
    • None

    Description

      This task is to create a new split balanced internal builtin that optimize the already existing builtin called splitBalanced.

      A list of optimizations initially proposed:

      1. Avoid having to remove empty elements after, to avoid double allocation
      2. Avoid having to materialize the combined MatrixBlock of X and Y but internally sort them and use the index created from sorting to construct a selection of elements for the balanced split.

      Additionally this task could look into our normal random split that also calls remove empty, and move this into the same internal (overloaded) random split instruction.

      An example script to optimize (the purpose of this script is to generate a challenging dataset to classify):

      x = rand(rows=$1, cols=$2, min=$3, max=$4, seed=13)
      x = ceil(x)
      
      source("nn/layers/tanh.dml") as tanh
      
      xs = scale(x, TRUE, TRUE)
      L1 = rand(rows=$2, cols=100, min=-1, max =1, seed=14)
      L2 = rand(rows=100, cols=50, min=-1, max =1, seed=15)
      L3 = rand(rows=50, cols=25, min=-1, max =1, seed=16)
      L4 = rand(rows=25, cols=10, min=-1, max =1, seed=18)
      
      x1 = tanh::forward(x %*% L1)
      x2 = tanh::forward(x1 %*% L2)
      x3 = tanh::forward(x2 %*% L3)
      x4 = x3 %*% L4
      
      y = rowIndexMax(x4)
      
      
      yt= table(y, 1)
      print("Class Distribution")
      print(toString(t(yt)))
      
      [x, y, xt, yt] = splitBalanced(X=x,Y=y)
      write(x, $5, format=$9)
      write(y, $6, format=$9)
      write(xt, $7, format=$9)
      write(yt, $8, format=$9)
      
      
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            baunsgaard Sebastian Baunsgaard
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: