[SPARK-44585] Fix warning condition in MLLib RankingMetrics ndcgAk - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.4.1
Fix Version/s: 3.4.2, 3.5.0, 4.0.0
Component/s: MLlib
Labels:
None

Description

The implementation of nDCG evaluation in MLLib with relevance score (added in 3.4.0, see https://issues.apache.org/jira/browse/SPARK-39446 and pull request) implements the following warning when the input data isn't correct: "# of ground truth set and # of relevance value set should be equal, check input data"

The logic for raising warnings is faulty at the moment: it raises a warning when the following conditions are both true:

rel is empty
lab.size and rel.size are not equal.

With the current logic, RankingMetrics will:

raise incorrect warning when a user is using it in the "binary" mode (i.e. no relevance values in the input)
not raise warning (that could be necessary) when the user is using it in the "non-binary" model (i.e. with relevance values in the input)

The logic should be to raise a warning should be:

rel is not empty
lab.size and rel.size are not equal.

Attachments

Activity

People

Assignee:: Guilhem Vuillier

Reporter:: Guilhem Vuillier

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 28/Jul/23 13:23

Updated:: 28/Jul/23 22:30

Resolved:: 28/Jul/23 22:30