I've mentioned svmlin before:
The licence is GPL, which means that it can be used as is for any purpose. I have some glue code around it to feed it from Lucene termvectos, but that is no more than a toy, and it's written in Jython.
As for making it work under M/R, I'd like to try the following. The runtime of a single training run can be more or less controlled by the amount of entities and features on input. A framework around it under M/R could allow a single training run only a maximum time, and fail at timeout. A retry could then for example use more agressive feature selection before running again, and/or use fewer entities for training.
In that way one gets a kind of 'best effort' set of classifiers from a Hadoop cluster. This would be useful when many classifiers are needed, for example in a larger hierarchy of classes. Some effort would be wasted on time outs, but with a cluster that would be acceptable.
One could also try make a single SVM training run work under Hadoop, but I have no idea how to approach that. Svmlin is not much code, but for the moment I don't want to spend time on its intricacies.
Btw, svmlin can also use unlabeled data, I'd like to use that feature too, but that may be better discussed at another issue.