There are few issues:
1. Batch API for topK similar users and topK similar products
2. Comparison of product x product similarities generated with columnSimilarities and compared with topK similar products
I added batch APIs for topK product recommendation for each user and topK user recommendation for each product in
SPARK-4231...similar batch API will be very helpful for topK similar users and topK similar products...
I agree with Cosine Similarity...you should be able to re-use column similarity calculations...I think a better idea is to add rowMatrix.similarRows and re-use that code to generate product similarities and user similarities...
But my question is more on validation. We can compute product similarities on raw features and we can compute product similarities on matrix product factor...which one is better ?