Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-605

SVM Regression: Accurancy should be improved for some data sets

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Abandoned
    • None
    • v1.9
    • None

    Description

      We run comparable test cases both in MADlib and libsvm and compared mean square error.
      We found that below data sets have worse score than svm

      1. Kernel function is dot:

      Data Sets	MADlib(Parallel = true)	MADlib(Parallel = false)	libsvm	        Madlib/libsvm
      bodyfat	        249962.1847	        4397613.616             	4.68E-05	5336898594
      mpg	        239380954.3	        1.89706E+11	                22.5239	        10627864.37
      
      Test case:
      SELECT madlib.svm_regression
                              ( 'madlibtestdata.svm_bodyfat'::text     --input_table
                              , 'madlibtestresult.reg_model_table'::text    --model_table
                              , 'true'::boolean       --parallel
                              , 'madlib.svm_dot'::text    --kernel_func
                              , 'false'::boolean        --verbose
                              , '0.1'::float8            --eta
                              , '0.005'::float8             --nu
                              , '0.05'::float8        --slambda
                         ) AS q;
      
      SELECT madlib.svm_regression
                              ( 'madlibtestdata.svm_mpg'::text     --input_table
                              , 'madlibtestresult.reg_model_table'::text    --model_table
                              , 'true'::boolean       --parallel
                              , 'madlib.svm_dot'::text    --kernel_func
                              , 'false'::boolean        --verbose
                              , '0.1'::float8            --eta
                              , '0.005'::float8             --nu
                              , '0.05'::float8        --slambda
                         ) AS q;
      
      

      2. Polynomial

      Data Sets	MADlib(Parallel = true)	MADlib(Parallel = false)	libsvm	Madlib/libsvm
      bodyfat	        4.07E+26	1.86E+27	0.00143458	2.83446E+29
      cpusmall	2.38E+71	4.41E+72	1.42E+42	1.67986E+29
      housing  	9.31E+29	6.79E+31	249267	        3.73671E+24
      mpg	        2.25E+37	9.89E+39	610.474	        3.68346E+34
      
      Test case example:
      SELECT madlib.svm_regression
                              ( 'madlibtestdata.svm_bodyfat'::text     --input_table
                              , 'madlibtestresult.reg_model_table'::text    --model_table
                              , 'true'::boolean       --parallel
                              , 'madlibtestdata.svm_polynomial'::text    --kernel_func
                              , 'false'::boolean        --verbose
                              , '0.1'::float8            --eta
                              , '0.005'::float8             --nu
                              , '0.05'::float8        --slambda
                         ) AS q;
      
      
      

      3. Data sets

      Data Sets Name	TrainSize	Attr	URL
       abalone 	 4,177 	 8 	 http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#abalone 
       bodyfat 	 252 	 14 	 http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#bodyfat 
       cpusmall 	 8,192 	 12 	 http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#cpusmalll 
       housing 	 506 	 13 	 http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#housing 
      
      

      Attachments

        Activity

          People

            riyer Rahul Iyer
            yaojl Jiali Yao
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: