Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17133 Improvements to linear methods in Spark
  3. SPARK-21152

Use level 3 BLAS operations in LogisticAggregator

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.1.1
    • None
    • ML

    Description

      In logistic regression gradient update, we currently compute by each individual row. If we blocked the rows together, we can do a blocked gradient update which leverages the BLAS GEMM operation.

      On high dimensional dense datasets, I've observed ~10x speedups. The problem here, though, is that it likely won't improve the sparse case so we need to keep both implementations around, and this blocked algorithm will require caching a new dataset of type:

      BlockInstance(label: Vector, weight: Vector, features: Matrix)
      

      We have avoided caching anything beside the original dataset passed to train in the past because it adds memory overhead if the user has cached this original dataset for other reasons. Here, I'd like to discuss whether we think this patch would be worth the investment, given that it only improves a subset of the use cases.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sethah Seth Hendrickson
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: