Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-374

You cannot sort on fields that don't exist

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.9
    • Component/s: core/search
    • Labels:
      None
    • Environment:

      Operating System: other
      Platform: Other

    • Bugzilla Id:
      34477

      Description

      While it's possible to search for fields that don't exist (you'll get 0 hits),
      you'll get an exception if you try to sort by a field that has no values. The
      exception is this:

      if (termEnum.term() == null) {
      throw new RuntimeException ("no terms in field " + field);
      }

      I'll attach a change suggested by Yonik Seeley that removes this exception.

      Also, the if-condition above is incomplete anyway, so currently the exception
      is not always thrown (as termEnum .term() might well be != null but point to a
      term in a different field already)

      1. ASF.LICENSE.NOT.GRANTED--sort.diff
        0.8 kB
        Daniel Naber
      2. sort.diff
        10 kB
        Yonik Seeley
      3. sort.diff
        3 kB
        Yonik Seeley

        Issue Links

          Activity

          Hide
          daniel.naber@t-online.de Daniel Naber added a comment -

          Created an attachment (id=14735)
          patch to remove exception

          Show
          daniel.naber@t-online.de Daniel Naber added a comment - Created an attachment (id=14735) patch to remove exception
          Hide
          daniel.naber@t-online.de Daniel Naber added a comment -

          Are there objections against this patch? If not, I'm going to commit it.

          Show
          daniel.naber@t-online.de Daniel Naber added a comment - Are there objections against this patch? If not, I'm going to commit it.
          Hide
          jakarta@ehatchersolutions.com Erik Hatcher added a comment -

          I'd like to have a test case added to Lucene that demonstrates this bug along with this patch that fixes
          it, but other than that I'm +1 on committing - looks reasonable to me.

          Show
          jakarta@ehatchersolutions.com Erik Hatcher added a comment - I'd like to have a test case added to Lucene that demonstrates this bug along with this patch that fixes it, but other than that I'm +1 on committing - looks reasonable to me.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          That fix was just a template. The changes need to be made in multiple places.
          There are currently 8 RuntimeExceptions thrown in FieldCacheImpl.java
          After a very quick look, it looks like only one is valid.

          The only one I would leave in is: "there are more terms than documents in field"

          Most of them need the same fix. getAuto() looks like it needs a different fix
          (a default fieldtype if there aren't any terms).

          For tests, looks like one for each possible fieldtype + auto is needed.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - That fix was just a template. The changes need to be made in multiple places. There are currently 8 RuntimeExceptions thrown in FieldCacheImpl.java After a very quick look, it looks like only one is valid. The only one I would leave in is: "there are more terms than documents in field" Most of them need the same fix. getAuto() looks like it needs a different fix (a default fieldtype if there aren't any terms). For tests, looks like one for each possible fieldtype + auto is needed.
          Hide
          daniel.naber@t-online.de Daniel Naber added a comment -

          I just see that there's another problem: the exception also occurs when you
          have a field, but it isn't indexed. I think that's a common case and these
          cases should not be silently ignored.

          Show
          daniel.naber@t-online.de Daniel Naber added a comment - I just see that there's another problem: the exception also occurs when you have a field, but it isn't indexed. I think that's a common case and these cases should not be silently ignored.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          > I just see that there's another problem: the exception also occurs when
          > you have a field, but it isn't indexed. I think that's a common case
          > and these cases should not be silently ignored.

          While this condition would be possible to check for, it would lead to some of
          the same inconsistencies and problems as trying to ban sorting on a field that
          doesn't exist.

          Lucene doesn't restrict every field with the same name to be of the same type.
          Field "foo" may be indexed for some documents and not indexed for others. So we
          are back in the same situation of not knowing if the query is a bug, or if it's
          just that no documents with "foo" indexed currently exist in the index, but may
          in the future or may have in the past.

          Since Lucene is a search engine rather than a search server, we should strive to
          keep it as flexible as possible to enable that many more applications to be
          built on top of it.

          Once this bug is fixed, I think FieldCacheImpl.java should have only one
          RuntimeException remaining.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - > I just see that there's another problem: the exception also occurs when > you have a field, but it isn't indexed. I think that's a common case > and these cases should not be silently ignored. While this condition would be possible to check for, it would lead to some of the same inconsistencies and problems as trying to ban sorting on a field that doesn't exist. Lucene doesn't restrict every field with the same name to be of the same type. Field "foo" may be indexed for some documents and not indexed for others. So we are back in the same situation of not knowing if the query is a bug, or if it's just that no documents with "foo" indexed currently exist in the index, but may in the future or may have in the past. Since Lucene is a search engine rather than a search server, we should strive to keep it as flexible as possible to enable that many more applications to be built on top of it. Once this bug is fixed, I think FieldCacheImpl.java should have only one RuntimeException remaining.
          Hide
          daniel.naber@t-online.de Daniel Naber added a comment -

          I now hesitate to commit this patch in its current form, as many people will
          be confused when sorting doesn't give the right results (because a field isn't
          indexed) but doesn't return an error or warning either. I think we need a
          small logging framework inside Lucene to notify the developer about such
          potential problems.

          Show
          daniel.naber@t-online.de Daniel Naber added a comment - I now hesitate to commit this patch in its current form, as many people will be confused when sorting doesn't give the right results (because a field isn't indexed) but doesn't return an error or warning either. I think we need a small logging framework inside Lucene to notify the developer about such potential problems.
          Hide
          cutting@apache.org cutting@apache.org added a comment -

          If someone searches for a field that has no values indexed then they will not
          find anything, and no warning is issued. Is it that different if they sort by a
          field that has no values indexed?

          Show
          cutting@apache.org cutting@apache.org added a comment - If someone searches for a field that has no values indexed then they will not find anything, and no warning is issued. Is it that different if they sort by a field that has no values indexed?
          Hide
          daniel.naber@t-online.de Daniel Naber added a comment -

          If we just remove the Exceptions, a non-indexed field will be ignored for
          sorting (i.e. it will be treated as if it was empty). So you'd get matches,
          but they probably won't be sorted the way you expect. This is difficult to
          debug without warnings, especially if there is more than one sort key and only
          some of them are ignored. So yes, I think it's worse than getting 0 matches,
          which is at least a clear indication that something is wrong.

          Show
          daniel.naber@t-online.de Daniel Naber added a comment - If we just remove the Exceptions, a non-indexed field will be ignored for sorting (i.e. it will be treated as if it was empty). So you'd get matches, but they probably won't be sorted the way you expect. This is difficult to debug without warnings, especially if there is more than one sort key and only some of them are ignored. So yes, I think it's worse than getting 0 matches, which is at least a clear indication that something is wrong.
          Hide
          cutting@apache.org cutting@apache.org added a comment -

          Would it be more sensible to throw an exception only if both the sort field is
          unindexed and the search returns zero documents? In other words, the exception
          could be thrown when the sorter is used, not when it is constructed.

          Show
          cutting@apache.org cutting@apache.org added a comment - Would it be more sensible to throw an exception only if both the sort field is unindexed and the search returns zero documents? In other words, the exception could be thrown when the sorter is used, not when it is constructed.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          > This is difficult to debug without warnings

          I think these types of checks should be done in a separate place (or at least be
          optional somehow). There are many other potential problems than sorting on a
          field that doesn't exist... but all developers shouldn't have to pay the price
          in overhead or lack of flexibility.

          How about a checkQuery() or a debugQuery() that could check for things like:

          • searching on a non-indexed field
          • sorting on a non-indexed field
          • documents that have multiple terms for the sort field
          • documents matching a query that have no term for the sort field
          Show
          yseeley@gmail.com Yonik Seeley added a comment - > This is difficult to debug without warnings I think these types of checks should be done in a separate place (or at least be optional somehow). There are many other potential problems than sorting on a field that doesn't exist... but all developers shouldn't have to pay the price in overhead or lack of flexibility. How about a checkQuery() or a debugQuery() that could check for things like: searching on a non-indexed field sorting on a non-indexed field documents that have multiple terms for the sort field documents matching a query that have no term for the sort field
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Updated my original patch from
          http://www.mail-archive.com/java-user@lucene.apache.org/msg00611.html
          which actually wasn't meant to be a complete patch, but just a template for one.

          This patch removes the "no terms in field" exception for all the types except Auto, which people probably shouldn't use anyway.
          I think it's important for the reasons detailed in that thread.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Updated my original patch from http://www.mail-archive.com/java-user@lucene.apache.org/msg00611.html which actually wasn't meant to be a complete patch, but just a template for one. This patch removes the "no terms in field" exception for all the types except Auto, which people probably shouldn't use anyway. I think it's important for the reasons detailed in that thread.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Attaching latest version.

          I looked into TestSort to add some tests for this, and was surprised to see some tests that should have covered it: testEmptyIndex()
          It turns out FieldCacheImpl special-cased an empty index with "if (retArray.length > 0) {"

          After removing only those special-case checks from FieldCacheImpl, the tests fail (demonstrating the problem).
          After applying my full patch, all tests pass again.

          This latest diff is a little hard to read because indentation changed due to the removal of the "if" test, but that's only difference between this and the last diff.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Attaching latest version. I looked into TestSort to add some tests for this, and was surprised to see some tests that should have covered it: testEmptyIndex() It turns out FieldCacheImpl special-cased an empty index with "if (retArray.length > 0) {" After removing only those special-case checks from FieldCacheImpl, the tests fail (demonstrating the problem). After applying my full patch, all tests pass again. This latest diff is a little hard to read because indentation changed due to the removal of the "if" test, but that's only difference between this and the last diff.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          This patch is also needed to ensure that searches on a multisearcher will be the same regardless of how documents are distributed among the subsearchers.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - This patch is also needed to ensure that searches on a multisearcher will be the same regardless of how documents are distributed among the subsearchers.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          I cloned this bug to LUCENE-459 to capture the desire for warnings in certain circumstances.
          I don't think those concerns should hold up this patch, which fixes clearly inconsistent behavior. If anyone still has concerns with this patch, speak up before it get's committed.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - I cloned this bug to LUCENE-459 to capture the desire for warnings in certain circumstances. I don't think those concerns should hold up this patch, which fixes clearly inconsistent behavior. If anyone still has concerns with this patch, speak up before it get's committed.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Patch applied.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Patch applied.

            People

            • Assignee:
              yseeley@gmail.com Yonik Seeley
              Reporter:
              daniel.naber@t-online.de Daniel Naber
            • Votes:
              4 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development