[SPARK-3568] Add metrics for ranking algorithms - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.2.0
Component/s: MLlib
Labels:
None

Target Version/s:

1.2.0

Description

Include common metrics for ranking algorithms (http://www-nlp.stanford.edu/IR-book/), including:

Mean Average Precision
Precision@n: top-n precision
Discounted cumulative gain (DCG) and NDCG

This implementation attempts to create a new class called RankingMetrics under org.apache.spark.mllib.evaluation, which accepts input (prediction and label pairs) as RDD[Array[T], Array[T]]. The following methods will be implemented:

RankingMetrics.scala

class RankingMetrics[T](predictionAndLabels: RDD[(Array[T], Array[T])]) {
  /* Returns the precsion@k for each query */
  lazy val precAtK: RDD[Array[Double]]

  /**
   * @param k the position to compute the truncated precision
   * @return the average precision at the first k ranking positions
   */
  def precision(k: Int): Double

  /* Returns the average precision for each query */
  lazy val avePrec: RDD[Double]

  /*Returns the mean average precision (MAP) of all the queries*/
  lazy val meanAvePrec: Double

  /*Returns the normalized discounted cumulative gain for each query */
  lazy val ndcgAtK: RDD[Array[Double]]

  /**
   * @param k the position to compute the truncated ndcg
   * @return the average ndcg at the first k ranking positions
   */
  def ndcg(k: Int): Double
}

Attachments

Issue Links

is related to

SPARK-18948 Add Mean Percentile Rank metric for ranking algorithms

Resolved

links to

[Github] Pull Request #2667 (coderxiang)

Activity

People

Assignee:: Shuo Xiang

Reporter:: Shuo Xiang

Shepherd:: Xiangrui Meng

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 17/Sep/14 18:15

Updated:: 20/Dec/16 15:32

Resolved:: 21/Oct/14 22:46