Lucene - Core
  1. Lucene - Core
  2. LUCENE-3390

Incorrect sort by Numeric values for documents missing the sorting field

    Details

    • Lucene Fields:
      Patch Available

      Description

      While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
      This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.

      Example scenario:
      Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
      Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.

      Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

      1. LUCENE-3390.patch
        23 kB
        Doron Cohen
      2. LUCENE-3390-BitsInterface.patch
        29 kB
        Uwe Schindler
      3. LUCENE-3390-BitsInterface.patch
        26 kB
        Doron Cohen
      4. LUCENE-3390-BitsInterface.patch
        24 kB
        Uwe Schindler
      5. LUCENE-3390-fix-like-trunk.patch
        22 kB
        Uwe Schindler
      6. LUCENE-3390-fix-like-trunk.patch
        19 kB
        Uwe Schindler
      7. LUCENE-3390-fix-like-trunk.patch
        19 kB
        Uwe Schindler
      8. LUCENE-3390-fix-like-trunk.patch
        13 kB
        Uwe Schindler
      9. LUCENE-3390-inverted.patch
        12 kB
        Uwe Schindler
      10. SortByDouble.java
        2 kB
        Gilad Barkai

        Issue Links

          Activity

          Gavin made changes -
          Link This issue is depended upon by LUCENE-3443 [ LUCENE-3443 ]
          Gavin made changes -
          Link This issue blocks LUCENE-3443 [ LUCENE-3443 ]
          Uwe Schindler made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Uwe Schindler made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Uwe Schindler made changes -
          Attachment LUCENE-3390-inverted.patch [ 12495395 ]
          Uwe Schindler made changes -
          Attachment LUCENE-3390-inverted.patch [ 12495396 ]
          Uwe Schindler made changes -
          Attachment LUCENE-3390-inverted.patch [ 12495395 ]
          Uwe Schindler made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Uwe Schindler made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Fix Version/s 3.5 [ 12317877 ]
          Resolution Fixed [ 1 ]
          Uwe Schindler made changes -
          Link This issue blocks LUCENE-3443 [ LUCENE-3443 ]
          Uwe Schindler made changes -
          Assignee Doron Cohen [ doronc ] Uwe Schindler [ thetaphi ]
          Uwe Schindler made changes -
          Attachment LUCENE-3390-BitsInterface.patch [ 12495376 ]
          Doron Cohen made changes -
          Attachment LUCENE-3390-BitsInterface.patch [ 12495372 ]
          Uwe Schindler made changes -
          Attachment LUCENE-3390-BitsInterface.patch [ 12495347 ]
          Uwe Schindler made changes -
          Attachment LUCENE-3390-fix-like-trunk.patch [ 12495332 ]
          Uwe Schindler made changes -
          Attachment LUCENE-3390-fix-like-trunk.patch [ 12494984 ]
          Uwe Schindler made changes -
          Attachment LUCENE-3390-fix-like-trunk.patch [ 12494982 ]
          Uwe Schindler made changes -
          Attachment LUCENE-3390-fix-like-trunk.patch [ 12494979 ]
          Uwe Schindler made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Doron Cohen made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Lucene Fields [New] [Patch Available]
          Fix Version/s 3.4 [ 12316675 ]
          Resolution Fixed [ 1 ]
          Doron Cohen made changes -
          Link This issue is related to LUCENE-2671 [ LUCENE-2671 ]
          Doron Cohen made changes -
          Assignee Doron Cohen [ doronc ]
          Doron Cohen made changes -
          Attachment LUCENE-3390.patch [ 12492628 ]
          Hoss Man made changes -
          Link This issue relates to LUCENE-2649 [ LUCENE-2649 ]
          Hoss Man made changes -
          Link This issue relates to LUCENE-831 [ LUCENE-831 ]
          Hoss Man made changes -
          Link This issue relates to LUCENE-2665 [ LUCENE-2665 ]
          Hoss Man made changes -
          Link This issue relates to SOLR-2134 [ SOLR-2134 ]
          Gilad Barkai made changes -
          Summary Incorrect sort by Numeric (double) values for documents missing the sorting field Incorrect sort by Numeric values for documents missing the sorting field
          Description While sorting results over a numeric double field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort.
          This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.

          Example scenario:
          Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
          Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.

          While the document with the missing value does match the query, I would expect it to come last, as it is not comparable by the other documents. For example, asking for the top 2 documents brings the document without any value which seems as a bug?
          While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
          This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.

          Example scenario:
          Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
          Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.

          Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?
          Labels double numeric sort double float int long numeric sort
          Gilad Barkai made changes -
          Description While sorting results over a numeric double field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort.
          This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.

          Example scenario:
          Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
          Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.

          Example code:
          public static void main(String[] args) throws Exception {
          RAMDirectory d = new RAMDirectory();
          IndexWriter w = new IndexWriter(d, new IndexWriterConfig(Version.LUCENE_33, new KeywordAnalyzer()));

          // 1st doc, value 3.5d
          Document doc = new Document();
          doc.add(new NumericField("f", Store.YES, true).setDoubleValue(3.5d));
          w.addDocument(doc);

          // 2nd doc, value of -10d
          doc = new Document();
          doc.add(new NumericField("f", Store.YES, true).setDoubleValue(-10d));
          w.addDocument(doc);

          // 3rd doc, no value at all
          w.addDocument(new Document());
          w.close();

          IndexSearcher s = new IndexSearcher(d);
          Sort sort = new Sort(new SortField("f", SortField.DOUBLE, true));
          TopDocs td = s.search(new MatchAllDocsQuery(), 10, sort);
          for (ScoreDoc sd : td.scoreDocs) {
          System.out.println(sd.doc + ": " + s.doc(sd.doc).get("f"));
          }
          s.close();
          d.close();
          }
           
          While sorting results over a numeric double field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort.
          This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.

          Example scenario:
          Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
          Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.

          While the document with the missing value does match the query, I would expect it to come last, as it is not comparable by the other documents. For example, asking for the top 2 documents brings the document without any value which seems as a bug?
          Gilad Barkai made changes -
          Field Original Value New Value
          Attachment SortByDouble.java [ 12491193 ]
          Gilad Barkai created issue -

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Gilad Barkai
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development