Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26351

Documented formula of precision at k does not match the actual code

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.3.3, 2.4.1, 3.0.0
    • Component/s: Documentation, MLlib
    • Labels:
      None

      Description

      The formula of the precision @ k for measuring the quality of the recommendations:

      https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html#ranking-systems

      says that j goes from 0 to min(|D|, k) , but according to the code, 

      https://github.com/apache/spark/blob/a63e7b2a212bab94d080b00cf1c5f397800a276a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L65

       

      val n = math.min(pred.length, k)

       

      The notation of Spark documentation defines

      D_i as the set of ground truth relevant documents for user i

      R_i as the set of recommended documents (i.e. predictions) given for user i .

      According to the code, the documentation should say j goes from 0 to min( | R_i |, k )

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                shahid shahid
                Reporter:
                olbapjose Pablo J. Villacorta
                Shepherd:
                Sean Owen
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: