Details

    • Type: Wish Wish
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Classification
    • Labels:
      None

      Description

      As SVM's have been mentioned a few times now, this would be a good place to concentrate discussions on the subject.

        Activity

        Hide
        Zarko ASENOV added a comment -

        OK currently merging AdaptSVM with the current LIBSVM and other patches. The Shogun project looks appealing but it lacks proper incremental/decremental SVM classification and regression. Seems linear combination of multiple kernels is possible with Shogun but I got no experience with that so far.

        Show
        Zarko ASENOV added a comment - OK currently merging AdaptSVM with the current LIBSVM and other patches. The Shogun project looks appealing but it lacks proper incremental/decremental SVM classification and regression. Seems linear combination of multiple kernels is possible with Shogun but I got no experience with that so far.
        Hide
        Zarko ASENOV added a comment -

        Hi, I might end up using Adaptive SVM (retrain using multiple referent models) http://www.cs.cmu.edu/~juny/AdaptSVM/index.html which is based on LibSVM. Adaptive SVM doesn't have the regression code so it's this or OnlineSVR. I will look into Mahout.

        Show
        Zarko ASENOV added a comment - Hi, I might end up using Adaptive SVM (retrain using multiple referent models) http://www.cs.cmu.edu/~juny/AdaptSVM/index.html which is based on LibSVM. Adaptive SVM doesn't have the regression code so it's this or OnlineSVR. I will look into Mahout.
        Hide
        Ted Dunning added a comment -

        Yes. Looks like this is likely to happen as part of the SGD framework.

        Would you like to help with that?

        Show
        Ted Dunning added a comment - Yes. Looks like this is likely to happen as part of the SGD framework. Would you like to help with that?
        Hide
        Zarko ASENOV added a comment -

        Is there any plans for implementing incremental (+ decremental) SVM ? There's this project http://onlinesvr.altervista.org/ incremental regression only.

        Show
        Zarko ASENOV added a comment - Is there any plans for implementing incremental (+ decremental) SVM ? There's this project http://onlinesvr.altervista.org/ incremental regression only.
        Hide
        Sean Owen added a comment -

        Is it safe to say this one isn't going to result in a change and can be mothballed?

        Show
        Sean Owen added a comment - Is it safe to say this one isn't going to result in a change and can be mothballed?
        Hide
        Isabel Drost-Fromm added a comment -

        > We could also, potentially, ask the authors of one of the other projects if they are interested in donating to the ASF as well.

        I think we should try that before reinventing the wheel. Anyone who could contact the authors of either svm library? I could write some mail over the weekend. If one of us knows them in person or is familiar with the code, that might make things easier

        Show
        Isabel Drost-Fromm added a comment - > We could also, potentially, ask the authors of one of the other projects if they are interested in donating to the ASF as well. I think we should try that before reinventing the wheel. Anyone who could contact the authors of either svm library? I could write some mail over the weekend. If one of us knows them in person or is familiar with the code, that might make things easier
        Hide
        Grant Ingersoll added a comment -

        Yeah, we could go the download route, and that is fine. Still, I'd much prefer we build our own at some point. We could also, potentially, ask the authors of one of the other projects if they are interested in donating to the ASF as well.

        Show
        Grant Ingersoll added a comment - Yeah, we could go the download route, and that is fine. Still, I'd much prefer we build our own at some point. We could also, potentially, ask the authors of one of the other projects if they are interested in donating to the ASF as well.
        Hide
        Paul Elschot added a comment -

        After Grant's remarks on mahout-dev, I'm quoting from my answers there to clarify a bit:

        With "using as is" I meant running the program without modifying it and without further distribution. The GPL allows that, as well as studying the code to get an idea of how it works.

        Svmlin might be added as an automatic download in a build script, much like some of the things automatically downloaded by the Lucene build script. I suppose libSVM is in a similar situation.

        Show
        Paul Elschot added a comment - After Grant's remarks on mahout-dev, I'm quoting from my answers there to clarify a bit: With "using as is" I meant running the program without modifying it and without further distribution. The GPL allows that, as well as studying the code to get an idea of how it works. Svmlin might be added as an automatic download in a build script, much like some of the things automatically downloaded by the Lucene build script. I suppose libSVM is in a similar situation.
        Hide
        Paul Elschot added a comment -

        I've mentioned svmlin before:
        http://people.cs.uchicago.edu/~vikass/svmlin.html
        The licence is GPL, which means that it can be used as is for any purpose. I have some glue code around it to feed it from Lucene termvectos, but that is no more than a toy, and it's written in Jython.

        As for making it work under M/R, I'd like to try the following. The runtime of a single training run can be more or less controlled by the amount of entities and features on input. A framework around it under M/R could allow a single training run only a maximum time, and fail at timeout. A retry could then for example use more agressive feature selection before running again, and/or use fewer entities for training.
        In that way one gets a kind of 'best effort' set of classifiers from a Hadoop cluster. This would be useful when many classifiers are needed, for example in a larger hierarchy of classes. Some effort would be wasted on time outs, but with a cluster that would be acceptable.

        One could also try make a single SVM training run work under Hadoop, but I have no idea how to approach that. Svmlin is not much code, but for the moment I don't want to spend time on its intricacies.

        Btw, svmlin can also use unlabeled data, I'd like to use that feature too, but that may be better discussed at another issue.

        Show
        Paul Elschot added a comment - I've mentioned svmlin before: http://people.cs.uchicago.edu/~vikass/svmlin.html The licence is GPL, which means that it can be used as is for any purpose. I have some glue code around it to feed it from Lucene termvectos, but that is no more than a toy, and it's written in Jython. As for making it work under M/R, I'd like to try the following. The runtime of a single training run can be more or less controlled by the amount of entities and features on input. A framework around it under M/R could allow a single training run only a maximum time, and fail at timeout. A retry could then for example use more agressive feature selection before running again, and/or use fewer entities for training. In that way one gets a kind of 'best effort' set of classifiers from a Hadoop cluster. This would be useful when many classifiers are needed, for example in a larger hierarchy of classes. Some effort would be wasted on time outs, but with a cluster that would be acceptable. One could also try make a single SVM training run work under Hadoop, but I have no idea how to approach that. Svmlin is not much code, but for the moment I don't want to spend time on its intricacies. Btw, svmlin can also use unlabeled data, I'd like to use that feature too, but that may be better discussed at another issue.

          People

          • Assignee:
            Unassigned
            Reporter:
            Paul Elschot
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development