Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19746

LogisticAggregator is inefficient in indexing

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.2.0
    • ML
    • None

    Description

      The following code occurs in the `LogisticAggregator.add` method, which is a performance critical path.

      val localCoefficients = bcCoefficients.value
      features.foreachActive { (index, value) =>
            val stdValue = value / localFeaturesStd(index)
            var j = 0
            while (j < numClasses) {
              margins(j) += localCoefficients(index * numClasses + j) * stdValue
              j += 1
            }
          }
      

      `llocalCoefficients(index * numClasses + j)` calls the `apply` method on `Vector`, which dispatches to `asBreeze(index * numClasses + j)` which creates a new Breeze vector, and then indexes it. This is very inefficient, creates a lot of unnecessary garbage, and we can avoid it by indexing the underlying array.

      Attachments

        Activity

          People

            sethah Seth Hendrickson
            sethah Seth Hendrickson
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: