Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2293

Replace RDD.zip usage by map with predict inside.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.1.0
    • MLlib
    • None

    Description

      In our guide, we use

      val prediction = model.predict(test.map(_.features))
      val predictionAndLabel = prediction.zip(test.map(_.label))
      

      This is not efficient because test will be computed twice. We should change it to

      val predictionAndLabel = test.map(p => (model.predict(p.features), p.label))
      

      It is also nice to add a `predictWith` method to predictive models.

      def predictWith[V](RDD[(Vector, V)]): RDD[(Double, V)]
      

      But I'm not sure whether this is a good name. `predictWithValue`?

      Attachments

        Activity

          People

            Unassigned Unassigned
            mengxr Xiangrui Meng
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: