Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2341

loadLibSVMFile doesn't handle regression datasets

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.0.0
    • 1.1.0
    • MLlib

    Description

      Many datasets exist in LibSVM format for regression tasks [1] but currently the loadLibSVMFile primitive doesn't handle regression datasets.

      More precisely, the LabelParser is either a MulticlassLabelParser or a BinaryLabelParser. What happens then is that the file is loaded but in multiclass mode : each target value is interpreted as a class name !

      The fix would be to write a RegressionLabelParser which converts target values to Double and plug it into the loadLibSVMFile routine.

      [1] http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html

      Attachments

        Activity

          People

            srowen Sean R. Owen
            eustache Eustache
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: