Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11284

ALS produces predictions as floats and should be double

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 1.5.1
    • None
    • ML
    • All

    Description

      Using pyspark.ml and DataFrames, The ALS recommender cannot be evaluated using the RegressionEvaluator, because of a type mis-match between the model transformation and the evaluation APIs. One can work around this by casting the prediction column into double before passing it into the evaluator. However, this does not work with pipelines and cross validation.

      Code and traceback below:

              als = ALS(rank=10, maxIter=30, regParam=0.1, userCol='userID', itemCol='movieID', ratingCol='rating')
              model = als.fit(training)
              predictions = model.transform(validation)
              evaluator = RegressionEvaluator(predictionCol='prediction', labelCol='rating')
              validationRmse = evaluator.evaluate(predictions, {evaluator.metricName: 'rmse'})
      

      Traceback:
      validationRmse = evaluator.evaluate(predictions,

      {evaluator.metricName: 'rmse'}

      )
      File "/Users/dominikdahlem/software/spark-1.6.0-SNAPSHOT-bin-custom-spark/python/lib/pyspark.zip/pyspark/ml/evaluation.py", line 63, in evaluate
      File "/Users/dominikdahlem/software/spark-1.6.0-SNAPSHOT-bin-custom-spark/python/lib/pyspark.zip/pyspark/ml/evaluation.py", line 94, in _evaluate
      File "/Users/dominikdahlem/software/spark-1.6.0-SNAPSHOT-bin-custom-spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in _call_
      File "/Users/dominikdahlem/projects/repositories/spark/python/pyspark/sql/utils.py", line 42, in deco
      raise IllegalArgumentException(s.split(': ', 1)[1])
      pyspark.sql.utils.IllegalArgumentException: requirement failed: Column prediction must be of type DoubleType but was actually FloatType.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ddahlem Dominik Dahlem
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 1h
                1h
                Remaining:
                Remaining Estimate - 1h
                1h
                Logged:
                Time Spent - Not Specified
                Not Specified