Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6654

KNearestNeighborClassifier not taking in consideration Class ranking

    Details

    • Lucene Fields:
      Patch Available
    • Flags:
      Patch

      Description

      Currently the KNN Classifier assign the score for a ClassificationResult, based only on the frequency of the class in the top K results.

      This is conceptually a simplification.
      Actually the ranking must take a part.

      If not this can happen :

      Top 4
      1) Class1
      2) Class1
      3) Class2
      4) Class2

      As a result of this Top 4 , both the classes will have the same score.
      But the expected result is that Class1 has a better score, as the MLT score the documents accordingly.

      1. LUCENE-6654.patch
        12 kB
        Alessandro Benedetti

        Activity

        Hide
        alessandro.benedetti Alessandro Benedetti added a comment -
        • boost factor based on class single results scoring added
        • 2 unit tests introduced to verify the proper behaviour

        To try the effect, you can run the tests with original code, that will fail ignoring the class ranking

        Show
        alessandro.benedetti Alessandro Benedetti added a comment - boost factor based on class single results scoring added 2 unit tests introduced to verify the proper behaviour To try the effect, you can run the tests with original code, that will fail ignoring the class ranking
        Hide
        alessandro.benedetti Alessandro Benedetti added a comment -

        Any feedback on this ?

        Cheers

        Show
        alessandro.benedetti Alessandro Benedetti added a comment - Any feedback on this ? Cheers
        Hide
        teofili Tommaso Teofili added a comment -

        Alessandro, thanks for your patch, I'll take a look and let you know shortly.

        Show
        teofili Tommaso Teofili added a comment - Alessandro, thanks for your patch, I'll take a look and let you know shortly.
        Hide
        teofili Tommaso Teofili added a comment -

        I've reviewed the patch and it looks good, thanks Alessandro.
        Just a minor thing about some fields / methods being marked as protected while they could be private, this is usually much better for information hiding and keeping the API as compact as possible.

        Show
        teofili Tommaso Teofili added a comment - I've reviewed the patch and it looks good, thanks Alessandro. Just a minor thing about some fields / methods being marked as protected while they could be private , this is usually much better for information hiding and keeping the API as compact as possible.
        Hide
        alessandro.benedetti Alessandro Benedetti added a comment -

        Hi Tommaso!
        The reason for that is related the fact that that class is involved also with another issue ( the document classifier one)
        But I agree with you, in isolation for this issue they should not be protected.
        When you take a look to the Document Classifier one, they overlap a little bit, this class should be merged ( basically i put this in the most straighforward way).
        When you take a look to the other you will see

        Show
        alessandro.benedetti Alessandro Benedetti added a comment - Hi Tommaso! The reason for that is related the fact that that class is involved also with another issue ( the document classifier one) But I agree with you, in isolation for this issue they should not be protected. When you take a look to the Document Classifier one, they overlap a little bit, this class should be merged ( basically i put this in the most straighforward way). When you take a look to the other you will see
        Hide
        alessandro.benedetti Alessandro Benedetti added a comment -

        To make it simple, using this patch before should not cause any conflict, if we keep not protected here( which is better for patch isolation) later a small merge will happen,
        Which of.course is not a problem!

        Show
        alessandro.benedetti Alessandro Benedetti added a comment - To make it simple, using this patch before should not cause any conflict, if we keep not protected here( which is better for patch isolation) later a small merge will happen, Which of.course is not a problem!
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1693891 from Tommaso Teofili in branch 'dev/trunk'
        [ https://svn.apache.org/r1693891 ]

        LUCENE-6654 - KNN taking into consideration class ranking

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1693891 from Tommaso Teofili in branch 'dev/trunk' [ https://svn.apache.org/r1693891 ] LUCENE-6654 - KNN taking into consideration class ranking
        Hide
        teofili Tommaso Teofili added a comment -

        thanks Alessandro for clarifying, I've committed your patch, will take a look at LUCENE-6631 in a short while and check out if we can avoid such protected methods / variables.

        Show
        teofili Tommaso Teofili added a comment - thanks Alessandro for clarifying, I've committed your patch, will take a look at LUCENE-6631 in a short while and check out if we can avoid such protected methods / variables.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1693906 from Tommaso Teofili in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1693906 ]

        LUCENE-6654 - backported knn taking into consideration class ranking to branch 5x

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1693906 from Tommaso Teofili in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1693906 ] LUCENE-6654 - backported knn taking into consideration class ranking to branch 5x
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            teofili Tommaso Teofili
            Reporter:
            alessandro.benedetti Alessandro Benedetti
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development