Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
SVM does provide the CV:
1) The CV results table can be obtained by setting the validation_result variable in params parameter. This can be any arbitrary name, including <output_table>_cv.
2) The _summary table reports the best cross-validated parameter, which corresponds to the model in the output table. This gives the user the exact parameters to recreate the model. It's open for debate if that is the purpose of the summary table.
3) The docs are definitely missing examples for CV.
But there seems to be a bug:
DROP TABLE IF EXISTS houses; CREATE TABLE houses (id INT, tax INT, bedroom INT, bath FLOAT, price INT, size INT, lot INT); INSERT INTO houses VALUES (1 , 590 , 2 , 1 , 50000 , 770 , 22100), (2 , 1050 , 3 , 2 , 85000 , 1410 , 12000), (3 , 20 , 3 , 1 , 22500 , 1060 , 3500), (4 , 870 , 2 , 2 , 90000 , 1300 , 17500), (5 , 1320 , 3 , 2 , 133000 , 1500 , 30000), (6 , 1350 , 2 , 1 , 90500 , 820 , 25700), (7 , 2790 , 3 , 2.5 , 260000 , 2130 , 25000), (8 , 680 , 2 , 1 , 142500 , 1170 , 22000), (9 , 1840 , 3 , 2 , 160000 , 1500 , 19000), (10 , 3680 , 4 , 2 , 240000 , 2790 , 20000), (11 , 1660 , 3 , 1 , 87000 , 1030 , 17500), (12 , 1620 , 3 , 2 , 118600 , 1250 , 20000), (13 , 3100 , 3 , 2 , 140000 , 1760 , 38000), (14 , 2070 , 2 , 3 , 148000 , 1550 , 14000), (15 , 650 , 3 , 1.5 , 65000 , 1450 , 12000);
Run training with CV:
DROP TABLE IF EXISTS houses_svm_gaussian_regression, houses_svm_gaussian_regression_summary, houses_svm_gaussian_regression_random, houses_svm_gaussian_regression_cv; SELECT madlib.svm_regression( 'houses', 'houses_svm_gaussian_regression', 'price', 'ARRAY[1, tax, bath, size]', 'gaussian', 'n_components=10', '', 'init_stepsize=[0.01, 1], max_iter=200, validation_result=houses_svm_gaussian_regression_cv, n_folds=3' ); SELECT * FROM houses_svm_gaussian_regression_cv;
Results in error:
InternalError: (psycopg2.InternalError) KeyError: 'params_dict' (plpython.c:4960) CONTEXT: Traceback (most recent call last): PL/Python function "svm_regression", line 23, in <module> return svm.svm(**globals()) PL/Python function "svm_regression", line 970, in svm PL/Python function "svm_regression", line 1033, in _cross_validate_svm PL/Python function "svm_regression", line 146, in output_tbl PL/Python function "svm_regression" [SQL: "SELECT madlib.svm_regression( 'houses',\n 'houses_svm_gaussian_regression',\n 'price',\n 'ARRAY[1, tax, bath, size]',\n 'gaussian',\n 'n_components=10',\n '',\n 'init_stepsize=[0.01, 1], max_iter=200, validation_result=houses_svm_gaussian_regression_cv, n_folds=3'\n );"]