Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.2
    • Fix Version/s: None
    • Component/s: Classification
    • Labels:
      None

      Description

      I wrote a proposal of parallel algorithm for SVM training. Any comment is welcome.

      1. ParallelPegasos.pdf
        113 kB
        zhao zhendong
      2. ParallelPegasos.doc
        89 kB
        zhao zhendong

        Issue Links

          Activity

          Hide
          zhao zhendong added a comment -

          The patch is a document, say the proposal of parallel algorithm for SVM training.

          Show
          zhao zhendong added a comment - The patch is a document, say the proposal of parallel algorithm for SVM training.
          Hide
          zhao zhendong added a comment -

          The patch includes two files with same content (.doc and .pdf).

          Show
          zhao zhendong added a comment - The patch includes two files with same content (.doc and .pdf).
          Hide
          Ted Dunning added a comment -

          Actually, the patch does not include any text files.

          Can you attach the files directly rather than trying to create a patch? Patches are intended more for proposed code.

          Show
          Ted Dunning added a comment - Actually, the patch does not include any text files. Can you attach the files directly rather than trying to create a patch? Patches are intended more for proposed code.
          Hide
          Ted Dunning added a comment -

          The patch doesn't contain any code.

          Show
          Ted Dunning added a comment - The patch doesn't contain any code.
          Hide
          Ted Dunning added a comment -

          Also, you put the files deep into the source code tree. They shouldn't be down there in the end, but will need to be put onto the wiki or into javadoc form.

          Show
          Ted Dunning added a comment - Also, you put the files deep into the source code tree. They shouldn't be down there in the end, but will need to be put onto the wiki or into javadoc form.
          Hide
          David Hall added a comment -

          As Ted hints, a proposal should really be placed on the wiki. http://cwiki.apache.org/MAHOUT/

          Looking forward to it.

          Show
          David Hall added a comment - As Ted hints, a proposal should really be placed on the wiki. http://cwiki.apache.org/MAHOUT/ Looking forward to it.
          Hide
          zhao zhendong added a comment -

          These are two distinct files with same content. The files are Proposal of Parallel Pegasos, which is one of the most famous SVM solver.

          Show
          zhao zhendong added a comment - These are two distinct files with same content. The files are Proposal of Parallel Pegasos, which is one of the most famous SVM solver.
          Hide
          Ted Dunning added a comment -

          Here are a few formatting suggestions:

          a) when cutting and pasting from somebody else's work, it is good to point this out. You should directly credit figure 3 and the algorithm pseudo-code which are cut-and-pasted directly from the original paper.

          b) text in your diagram got resized and is now only partially readable. This makes it a bit harder to follow exactly what you intend.

          More importantly, the parameter k in the original paper is a batch size. You propose to parallelize the computation of each batch, but otherwise leave the main structure of the computation in place. If we assume a small cluster with, say 100 cores (12 machines or so), then if you set k to 1000, each core will get to do about a dozen vector operations. This is likely to be no more than a microsecond of computation per core per iteration. My guess is that this will result in very, very poor CPU utilization since you will require on map-reduce invocation per iteration. Concretely put, you will have about a millisecond of useful computation every 10 seconds or so.

          You approach would probably work much better if applied to a single multi-core machine where the very high rendezvous rate would be more achievable. I don't expect that this proposed approach will work with map-reduce.

          On the other hand, Pegasos is a pretty scalable algorithm even on a single machine. If you were able to produce a high quality sequential implementation, that would be a substantial contribution to Mahout.

          Show
          Ted Dunning added a comment - Here are a few formatting suggestions: a) when cutting and pasting from somebody else's work, it is good to point this out. You should directly credit figure 3 and the algorithm pseudo-code which are cut-and-pasted directly from the original paper. b) text in your diagram got resized and is now only partially readable. This makes it a bit harder to follow exactly what you intend. More importantly, the parameter k in the original paper is a batch size. You propose to parallelize the computation of each batch, but otherwise leave the main structure of the computation in place. If we assume a small cluster with, say 100 cores (12 machines or so), then if you set k to 1000, each core will get to do about a dozen vector operations. This is likely to be no more than a microsecond of computation per core per iteration. My guess is that this will result in very, very poor CPU utilization since you will require on map-reduce invocation per iteration. Concretely put, you will have about a millisecond of useful computation every 10 seconds or so. You approach would probably work much better if applied to a single multi-core machine where the very high rendezvous rate would be more achievable. I don't expect that this proposed approach will work with map-reduce. On the other hand, Pegasos is a pretty scalable algorithm even on a single machine. If you were able to produce a high quality sequential implementation, that would be a substantial contribution to Mahout.
          Hide
          zhao zhendong added a comment -

          Thanks for your comments.

          Sure, actually, I have pointed out before "the framework of this
          implementation is:"

          Thanks, I will revise it later.

          I understand this concern. Actually, if we set the parameter k to 1,000,000
          or higher, do you think it is reasonable to take advantage of Map-reduce
          framework? I mean, from system implementation's view.

          Does there have any other thinking about how to extend this algorithm to a
          parallel version?

          Yeap. That's why we discuss this issue. But, I really need some comments and
          helps due I am still a newbie here.


          -------------------------------------------------------------

          Zhen-Dong Zhao (Maxim)

          <><<><><><><><><><>><><><><><>>>>>>

          Department of Computer Science
          School of Computing
          National University of Singapore

          Homepage:http://zhaozhendong.googlepages.com
          Mail: zhaozhendong@gmail.com

          Show
          zhao zhendong added a comment - Thanks for your comments. Sure, actually, I have pointed out before "the framework of this implementation is:" Thanks, I will revise it later. I understand this concern. Actually, if we set the parameter k to 1,000,000 or higher, do you think it is reasonable to take advantage of Map-reduce framework? I mean, from system implementation's view. Does there have any other thinking about how to extend this algorithm to a parallel version? Yeap. That's why we discuss this issue. But, I really need some comments and helps due I am still a newbie here. – ------------------------------------------------------------- Zhen-Dong Zhao (Maxim) <><<><><><><><><><>><><><><><>>>>>> Department of Computer Science School of Computing National University of Singapore Homepage: http://zhaozhendong.googlepages.com Mail: zhaozhendong@gmail.com
          Hide
          Ted Dunning added a comment -

          I understand this concern. Actually, if we set the parameter k to 1,000,000
          or higher, do you think it is reasonable to take advantage of Map-reduce
          framework? I mean, from system implementation's view.

          If you increase the value of k to very large values, you will be able to get a bit more computation, but if you follow my small cluster example I think that increasing k from 1000 to 1,000,000 will likely increase efficiency from 0.1% to less than 50% and will drive the algorithm well beyond the region were kT is constant. You will still have quite a lot of I/O per cycle which may prevent you from achieving even 10% efficiency.

          For larger clusters, the problem will be much worse.

          Go ahead and try it, though. Your real results count for more than my estimates.

          And as I said before, getting a good sequential implementation is of real value as well.

          Show
          Ted Dunning added a comment - I understand this concern. Actually, if we set the parameter k to 1,000,000 or higher, do you think it is reasonable to take advantage of Map-reduce framework? I mean, from system implementation's view. If you increase the value of k to very large values, you will be able to get a bit more computation, but if you follow my small cluster example I think that increasing k from 1000 to 1,000,000 will likely increase efficiency from 0.1% to less than 50% and will drive the algorithm well beyond the region were kT is constant. You will still have quite a lot of I/O per cycle which may prevent you from achieving even 10% efficiency. For larger clusters, the problem will be much worse. Go ahead and try it, though. Your real results count for more than my estimates. And as I said before, getting a good sequential implementation is of real value as well.
          Hide
          Ted Dunning added a comment - - edited

          Can you specify this sequential implementation?

          k = 1

          Otherwise as in the Pegasos article. No parallelism.

          Show
          Ted Dunning added a comment - - edited Can you specify this sequential implementation? k = 1 Otherwise as in the Pegasos article. No parallelism.
          Hide
          zhao zhendong added a comment -

          For binary Classification ( two-class ), we use the sequential svm solver currently.

          However, for multi-classification, we extend to parallel version based on MapReduce framework.

          Please check out this issue:
          https://issues.apache.org/jira/browse/MAHOUT-232

          Show
          zhao zhendong added a comment - For binary Classification ( two-class ), we use the sequential svm solver currently. However, for multi-classification, we extend to parallel version based on MapReduce framework. Please check out this issue: https://issues.apache.org/jira/browse/MAHOUT-232
          Hide
          Ted Dunning added a comment -

          Is this going to be complete this week or next?

          If not, we should push it to 0.4

          (and given the current state, I would guess that there is no other option)

          Show
          Ted Dunning added a comment - Is this going to be complete this week or next? If not, we should push it to 0.4 (and given the current state, I would guess that there is no other option)
          Hide
          zhao zhendong added a comment -

          So far, I didn't work on this parallel Binary Classification, therefore it
          could not be pushed in 0.3. Just as discussed with you, I think that the
          parallel multi-classification is easier leverage the parallelism due to it
          can be decomposed as a set of binary classifiers. Does it make sense?

          I may try this issue later. But I do not know whether this method can
          achieve even a little bit improvement.

          Cheers,
          Zhendong


          -------------------------------------------------------------

          Zhen-Dong Zhao (Maxim)

          <><<><><><><><><><>><><><><><>>>>>>

          Department of Computer Science
          School of Computing
          National University of Singapore

          Show
          zhao zhendong added a comment - So far, I didn't work on this parallel Binary Classification, therefore it could not be pushed in 0.3. Just as discussed with you, I think that the parallel multi-classification is easier leverage the parallelism due to it can be decomposed as a set of binary classifiers. Does it make sense? I may try this issue later. But I do not know whether this method can achieve even a little bit improvement. Cheers, Zhendong – ------------------------------------------------------------- Zhen-Dong Zhao (Maxim) <><<><><><><><><><>><><><><><>>>>>> Department of Computer Science School of Computing National University of Singapore
          Hide
          Sean Owen added a comment -

          Moving to 0.4 per Zhao's comment

          Show
          Sean Owen added a comment - Moving to 0.4 per Zhao's comment
          Hide
          Ted Dunning added a comment -

          Zhao,

          My thought is that having a good sequential SVM that learns very fast would be almost as scalable as a parallel implementation, especially if it is right next to a good SGD logistic regression implementation.

          My guess is that speedup by randomized variable sub-set is likely to be the most effective strategy if we absolutely need to have speedup. It is also possible that just speeding up the parameter sweeps that are normal practice for any serious data mining would be just about as useful as making learning fast for a single parameter setting. That would require giving different maps different parameter settings and having each of them read the entire data set. Each mapper should probably run multiple settings at once so that the data is re-used relatively efficiently.

          Show
          Ted Dunning added a comment - Zhao, My thought is that having a good sequential SVM that learns very fast would be almost as scalable as a parallel implementation, especially if it is right next to a good SGD logistic regression implementation. My guess is that speedup by randomized variable sub-set is likely to be the most effective strategy if we absolutely need to have speedup. It is also possible that just speeding up the parameter sweeps that are normal practice for any serious data mining would be just about as useful as making learning fast for a single parameter setting. That would require giving different maps different parameter settings and having each of them read the entire data set. Each mapper should probably run multiple settings at once so that the data is re-used relatively efficiently.
          Hide
          Sean Owen added a comment -

          Has this gone stale? not sure of the status.

          Show
          Sean Owen added a comment - Has this gone stale? not sure of the status.
          Hide
          zhao zhendong added a comment -

          Actually, the previous proposal has been terminated. I don't think it can
          obtain any benefit from such framework.

          But still think about new one. So I guess, it can be shut down. If some one
          or me can come up another new framework for parallel SVM, then we can create
          a new thread.


          -------------------------------------------------------------

          Zhen-Dong Zhao (Maxim)

          <><<><><><><><><><>><><><><><>>>>>>

          Department of Computer Science
          School of Computing
          National University of Singapore

          Show
          zhao zhendong added a comment - Actually, the previous proposal has been terminated. I don't think it can obtain any benefit from such framework. But still think about new one. So I guess, it can be shut down. If some one or me can come up another new framework for parallel SVM, then we can create a new thread. – ------------------------------------------------------------- Zhen-Dong Zhao (Maxim) <><<><><><><><><><>><><><><><>>>>>> Department of Computer Science School of Computing National University of Singapore

            People

            • Assignee:
              Unassigned
              Reporter:
              zhao zhendong
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development