Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1683

RegexQuery matches terms the input regex doesn't actually match

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.2
    • Fix Version/s: 2.9
    • Component/s: modules/other
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I was writing some unit tests for our own wrapper around the Lucene regex classes, and got tripped up by something interesting.

      The regex "cat." will match "cats" but also anything with "cat" and 1+ following letters (e.g. "cathy", "catcher", ...) It is as if there is an implicit .* always added to the end of the regex.

      Here's a unit test for the behaviour I would expect myself:

      @Test
      public void testNecessity() throws Exception {
      File dir = new File(new File(System.getProperty("java.io.tmpdir")), "index");
      IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(), true);
      try

      { Document doc = new Document(); doc.add(new Field("field", "cat cats cathy", Field.Store.YES, Field.Index.TOKENIZED)); writer.addDocument(doc); }

      finally

      { writer.close(); }

      IndexReader reader = IndexReader.open(dir);
      try

      { TermEnum terms = new RegexQuery(new Term("field", "cat.")).getEnum(reader); assertEquals("Wrong term", "cats", terms.term()); assertFalse("Should have only been one term", terms.next()); }

      finally

      { reader.close(); }

      }

      This test fails on the term check with terms.term() equal to "cathy".

      Our workaround is to mangle the query like this:

      String fixed = String.format("(?:%s)$", original);

        Attachments

          Activity

            People

            • Assignee:
              mikemccand Michael McCandless
              Reporter:
              trejkaz Trejkaz
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: