Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30641 ML algs blockify input vectors
  3. SPARK-32061

potential regression if use memoryUsage instead of numRows

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.1.0
    • Fix Version/s: None
    • Component/s: ML, PySpark
    • Labels:
      None

      Description

      1, if the `memoryUsage` is improperly set, for example, too small to store a instance;

      2,  the blockify+GMM reuse two matrices whose shape is related to current blockSize:

      @transient private lazy val auxiliaryProbMat = DenseMatrix.zeros(blockSize, k)
      @transient private lazy val auxiliaryPDFMat = DenseMatrix.zeros(blockSize, numFeatures) 

      When implementing blockify+GMM, I found that if I do not pre-allocate those matrices, there will be seriously regression (maybe 3~4 slower, I fogot the detailed numbers);

      3, in MLP, three pre-allocated objects are also related to numRows:

      if (ones == null || ones.length != delta.cols) ones = BDV.ones[Double](delta.cols)
      
      // TODO: allocate outputs as one big array and then create BDMs from it
      if (outputs == null || outputs(0).cols != currentBatchSize) {
      ...
      
      // TODO: allocate deltas as one big array and then create BDMs from it
      if (deltas == null || deltas(0).cols != currentBatchSize) {
        deltas = new Array[BDM[Double]](layerModels.length)
      ... 

      I am not very familiar with the impl of MLP and failed to find some related document about this pro-allocation. But I guess there maybe regression if we disable this pro-allocation, since those objects look relatively big.

       

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              podongfeng zhengruifeng
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: