# A BSP matrix multiplication implementation.

## Details

• Type: Improvement
• Status: Resolved
• Priority: Major
• Resolution: Later
• Affects Version/s: None
• Fix Version/s: None
• Component/s:
• Labels:
None

## Description

Using the BSP model, we could improve the performance of mat-mat multiplication. Let's evaluation it out on this issue.

## Activity

Hide
Edward J. Yoon added a comment - - edited

Here's my multi threaded version of mat-mat mult.

In this example,

• The global storage (int[][] out) was used. (Of course, you can also split the summation into n-supersteps.)
```class MatMult extends Thread {
static int m1[][];
static int m2[][];
static int out[][];
static int n = 2;
int row;

MatMult(int i) {
row = i;
this.start();
}

public void run() {
int i, j;
for (i = 0; i < n; i++) {
out[row][i] = 0;

for (j = 0; j < n; j++) {
out[row][i] = out[row][i] + m1[row][j] * m2[j][i];
}
}
}

public static void main(String args[]) throws Exception {
int i, j;
int n = 2;
m1 = new int[][] { { 1, 2 }, { 3, 4 } };
m2 = new int[][] { { 1, 2 }, { 3, 4 } };
out = new int[n][n];

MatMult mat[] = new MatMult[n];
for (i = 0; i < n; i++) {
mat[i] = new MatMult(i);
}

for (i = 0; i < n; i++) {
mat[i].join();
}

System.out.println("OUTPUT :");
for (i = 0; i < n; i++) {
for (j = 0; j < n; j++) {
System.out.print(out[i][j] + "\t");
}
System.out.println();
}
}
}
```
Show
Edward J. Yoon added a comment - - edited Here's my multi threaded version of mat-mat mult. In this example, Thread.join() means a barrier. The global storage (int[][] out) was used. (Of course, you can also split the summation into n-supersteps.) class MatMult extends Thread { static int m1[][]; static int m2[][]; static int out[][]; static int n = 2; int row; MatMult( int i) { row = i; this .start(); } public void run() { int i, j; for (i = 0; i < n; i++) { out[row][i] = 0; for (j = 0; j < n; j++) { out[row][i] = out[row][i] + m1[row][j] * m2[j][i]; } } } public static void main( String args[]) throws Exception { int i, j; int n = 2; m1 = new int [][] { { 1, 2 }, { 3, 4 } }; m2 = new int [][] { { 1, 2 }, { 3, 4 } }; out = new int [n][n]; MatMult mat[] = new MatMult[n]; for (i = 0; i < n; i++) { mat[i] = new MatMult(i); } for (i = 0; i < n; i++) { mat[i].join(); } System .out.println( "OUTPUT :" ); for (i = 0; i < n; i++) { for (j = 0; j < n; j++) { System .out.print(out[i][j] + "\t" ); } System .out.println(); } } }
Hide
Edward J. Yoon added a comment -

move to examples

Show
Edward J. Yoon added a comment - move to examples
Hide
Edward J. Yoon added a comment -

change fix version to 0.3

Show
Edward J. Yoon added a comment - change fix version to 0.3
Hide
Thomas Jungblut added a comment -

Seems to be good parallelizable.

Show
Thomas Jungblut added a comment - I recently read a paper about it which used Strassen's algorithm. [1] Seems to be good parallelizable. [1] http://en.wikipedia.org/wiki/Strassen_algorithm
Hide
Thomas Jungblut added a comment -

Played a bit with the algorithm, iterative and with the new fork/join framework in java7.
It is really cool, but it is just for quadratic matrices which length is a power of 2.
We should focus on a more general solution, shouldn't we?

I tagged the iterative solution with HAMA commentaries, how we can make a BSP out of it.

https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/math/matrix/StrassenMatrixMultiplication.java

If you are interested in the fork join implementation have a look at:

https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/math/matrix/ForkJoinStrassenMatrixMultiplication.java

Maybe if I have a bit more time later I can write a BSP version.

Show
Thomas Jungblut added a comment - Played a bit with the algorithm, iterative and with the new fork/join framework in java7. It is really cool, but it is just for quadratic matrices which length is a power of 2. We should focus on a more general solution, shouldn't we? I tagged the iterative solution with HAMA commentaries, how we can make a BSP out of it. https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/math/matrix/StrassenMatrixMultiplication.java If you are interested in the fork join implementation have a look at: https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/math/matrix/ForkJoinStrassenMatrixMultiplication.java Maybe if I have a bit more time later I can write a BSP version.
Hide
Tommaso Teofili added a comment -

I think it'd be nice also to implement the well known algorithms Cannon and DNS using BSP model and eventually add them to the hama-examples module.

Show
Tommaso Teofili added a comment - I think it'd be nice also to implement the well known algorithms Cannon and DNS using BSP model and eventually add them to the hama-examples module.
Hide
Thomas Jungblut added a comment -

After seeing this:

I guess we should get a first scratch implementation based on this MPI version:
http://sushpa.wordpress.com/2008/05/20/simple-matrix-multiplication-on-mpi/

It's quite easy, the first matrix A must be row-partitioned and the second matrix must be available in every peer.

Show
Thomas Jungblut added a comment - After seeing this: http://stackoverflow.com/questions/9708427/hadoop-matrix-multiplication I guess we should get a first scratch implementation based on this MPI version: http://sushpa.wordpress.com/2008/05/20/simple-matrix-multiplication-on-mpi/ It's quite easy, the first matrix A must be row-partitioned and the second matrix must be available in every peer.
Hide
Thomas Jungblut added a comment -

Very simplistic first working scratch of the normal matrix multiplication:

https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/math/bsp/MatrixMultiplicationBSP.java

It uses row and column partitioning and a bit of messaging.
However this can be much more improved.

I think it'd be nice also to implement the well known algorithms Cannon and DNS using BSP model and eventually add them to the hama-examples module.

Do you have some papers for them? Especially MPI versions can be very well translated to BSP I've seen

This is a good example for DNS: http://imada.sdu.dk/~daniel/DM818-2011/Assignments/mand3/index.html

Show
Thomas Jungblut added a comment - Very simplistic first working scratch of the normal matrix multiplication: https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/math/bsp/MatrixMultiplicationBSP.java It uses row and column partitioning and a bit of messaging. However this can be much more improved. I think it'd be nice also to implement the well known algorithms Cannon and DNS using BSP model and eventually add them to the hama-examples module. Do you have some papers for them? Especially MPI versions can be very well translated to BSP I've seen This is a good example for DNS: http://imada.sdu.dk/~daniel/DM818-2011/Assignments/mand3/index.html
Hide
Edward J. Yoon added a comment -

Looks great to me. I'm going to benchmark this.

Show
Edward J. Yoon added a comment - Looks great to me. I'm going to benchmark this.
Hide
Thomas Jungblut added a comment -

Hmm you probably need Java7 for that. But I guess you could remove my vector class with that of mahout-math. I have quite the same interface.

And I think it is not very scalable. So a DNS would be better.
Heres a paper about how it could look like:
http://cloudscale.com/images/bsp/matmultbsp.pdf

Show
Thomas Jungblut added a comment - Hmm you probably need Java7 for that. But I guess you could remove my vector class with that of mahout-math. I have quite the same interface. And I think it is not very scalable. So a DNS would be better. Heres a paper about how it could look like: http://cloudscale.com/images/bsp/matmultbsp.pdf
Hide
Thomas Jungblut added a comment -

Okay if you really want to benchmark it:

https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/math/bsp/MatrixMultiplicationBSP.java

I have just extracted my math lib to https://github.com/thomasjungblut/tjungblut-math

You can build it like usually with maven.

What you need though is the VectorWritable class:

https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/clustering/model/VectorWritable.java

This should be fine usable with Hama-0.4, Hadoop 20.2 and Java6.

Show
Thomas Jungblut added a comment - Okay if you really want to benchmark it: https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/math/bsp/MatrixMultiplicationBSP.java I have just extracted my math lib to https://github.com/thomasjungblut/tjungblut-math You can build it like usually with maven. What you need though is the VectorWritable class: https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/clustering/model/VectorWritable.java This should be fine usable with Hama-0.4, Hadoop 20.2 and Java6.
Hide
Thomas Jungblut added a comment -

Now that my math lib is in ml-package, let's just contribute the code.
We can make it scalable later on.

Show
Thomas Jungblut added a comment - Now that my math lib is in ml-package, let's just contribute the code. We can make it scalable later on.
Hide
Edward J. Yoon added a comment -

Schedule to 0.7

Show
Edward J. Yoon added a comment - Schedule to 0.7
Hide
Martin Illecker added a comment -

Hi!
Is this matrix multiplication implementation still missing?
I thought Thomas has implemented it already?
Is anyone assigned to this topic?

Show
Martin Illecker added a comment - Hi! Is this matrix multiplication implementation still missing? I thought Thomas has implemented it already? Is anyone assigned to this topic?
Hide
Edward J. Yoon added a comment -

Yes, it is still missing. Please feel free to assign yourself if you want to upload your implementation.

Show
Edward J. Yoon added a comment - Yes, it is still missing. Please feel free to assign yourself if you want to upload your implementation.
Hide
Martin Illecker added a comment -

Do we have any references or implementations for a basis?
Otherwise I will take Thomas's references [1][2] as an entry point.

Show
Martin Illecker added a comment - Do we have any references or implementations for a basis? Otherwise I will take Thomas's references [1] [2] as an entry point. [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.52.480 [2] http://link.springer.com/article/10.1007%2FPL00008264
Hide
Edward J. Yoon added a comment -

Otherwise I will take Thomas's references [1][2] as an entry point.
[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.52.480

+1

Show
Edward J. Yoon added a comment - Otherwise I will take Thomas's references [1] [2] as an entry point. [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.52.480 [2] http://link.springer.com/article/10.1007%2FPL00008264 +1

## People

• Assignee:
Unassigned
Reporter:
Edward J. Yoon