Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26351

Documented formula of precision at k does not match the actual code

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.4.0
    • 2.3.3, 2.4.1, 3.0.0
    • Documentation, MLlib
    • None

    Description

      The formula of the precision @ k for measuring the quality of the recommendations:

      https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html#ranking-systems

      says that j goes from 0 to min(|D|, k) , but according to the code, 

      https://github.com/apache/spark/blob/a63e7b2a212bab94d080b00cf1c5f397800a276a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L65

       

      val n = math.min(pred.length, k)

       

      The notation of Spark documentation defines

      D_i as the set of ground truth relevant documents for user i

      R_i as the set of recommended documents (i.e. predictions) given for user i .

      According to the code, the documentation should say j goes from 0 to min( | R_i |, k )

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            shahid shahid
            olbapjose Pablo J. Villacorta
            Sean Owen Sean Owen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment