Description
Following https://github.com/apache/spark/pull/30810, I've continued looking for ways to accelerate the usage of BLAS in Spark. With this PR, I integrate work done in the dev.ludovic.netlib Maven package.
The dev.ludovic.netlib library wraps the original com.github.fommil.netlib library and focus on accelerating the linear algebra routines in use in Spark. When running the {{org.apache.spark.ml.linalg.BLASBenchmark}}benchmarking suite, I get the results at [1] on an Intel machine. Moreover, this library is thoroughly tested to return the exact same results as the reference implementation.
Under the hood, it reimplements the necessary algorithms in pure autovectorization-friendly Java 8, as well as takes advantage of the Vector API and Foreign Linker API introduced in JDK 16 when available.