Details
-
Question
-
Status: Resolved
-
Minor
-
Resolution: Invalid
-
2.3.1
-
None
-
Spark 2.3.1 SNAPSHOT (as of June 25th)
Kubernetes version 1.7.5
Kubernetes cluster, consisting of 4 Nodes with 16 GB RAM, 8 core Intel processors.
Description
Usually native BLAS libraries speed up the execution time of CPU-heavy operations as for example in MLlib quite significantly.
Of course, the initial error
WARN BLAS:61 - Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
can be resolved not so easily, since, as reported [here|https://github.com/apache/spark/pull/19717/files/7d2b30373b2e4d8d5311e10c3f9a62a2d900d568,] this seems to be the issue because of the underlying image used by the Spark Dockerfile.
Re-building spark with
-Pnetlib-lgpl
also does not solve the problem, but I managed to build BLAS and LAPACK into Alpine, with a lot of tricks involved.
Interestingly, I noticed that the performance of PCA in my case dropped quite significantly (with BLAS support, compared to the netlib-java fallback). I am aware of SPARK-21305 as well, but that did not help my case, either.
Furthermore, calling SVD on a matrix of only size 5000x5000 (density 1%) already throws an error when trying to use native ARPACK, but runs perfectly fine with the fallback version.
The question would be whether there has been some investigation in that direction already.
Or, if not, whether it would be interesting for the Spark community to provide a
- more detailed report with respect to timings/configurations/test setup
- a provided Dockerfile to build Spark with BLAS/LAPACK/ARPACK using the shipped Dockerfile as a basis
Attachments
Issue Links
- relates to
-
SPARK-26773 Consider alternative base images for Kubernetes
- Resolved