Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7576

RegExp automaton causes NPE on Terms.intersect

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 6.2.1
    • Fix Version/s: 7.0, 6.4
    • Component/s: core/codecs, core/index
    • Labels:
      None
    • Environment:

      java version "1.8.0_77" macOS 10.12.1

    • Lucene Fields:
      New

      Description

      Calling org.apache.lucene.index.Terms.intersect(automaton, null) causes an NPE:

      String index_path = <path to index>
      String term = <a valid term name>

      Directory directory = FSDirectory.open(Paths.get(index_path));
      IndexReader reader = DirectoryReader.open(directory);
      Fields fields = MultiFields.getFields(reader);
      Terms terms = fields.terms(args[1]);
      CompiledAutomaton automaton = new CompiledAutomaton(
      new RegExp("do_not_match_anything").toAutomaton());

      TermsEnum te = terms.intersect(automaton, null);

      throws:

      Exception in thread "main" java.lang.NullPointerException
      at org.apache.lucene.codecs.blocktree.IntersectTermsEnum.<init>(IntersectTermsEnum.java:127)
      at org.apache.lucene.codecs.blocktree.FieldReader.intersect(FieldReader.java:185)
      at org.apache.lucene.index.MultiTerms.intersect(MultiTerms.java:85)
      ...

      1. LUCENE-7576.patch
        3 kB
        Michael McCandless

        Activity

        Hide
        mikemccand Michael McCandless added a comment -

        I'll look...

        Show
        mikemccand Michael McCandless added a comment - I'll look...
        Hide
        romseygeek Alan Woodward added a comment -

        TermsEnum.intersect() doesn't work with single-string automata, apparently; we need to use CompiledAutomaton.getTermsEnum() instead. It would be nice to have a better error message in FilterReader though. Or maybe check for the automaton type, and delegate through if need be?

        Show
        romseygeek Alan Woodward added a comment - TermsEnum.intersect() doesn't work with single-string automata, apparently; we need to use CompiledAutomaton.getTermsEnum() instead. It would be nice to have a better error message in FilterReader though. Or maybe check for the automaton type, and delegate through if need be?
        Hide
        mikemccand Michael McCandless added a comment -

        Patch w/ test (thank you!) and fix. This is unfortunately a confusing expert API; other terms dicts were checking that the provided compiled automaton is NORMAL and throwing a clearer exception if not, so I carried that same check over to the default terms dict. I also added a note to the javadocs for Terms.intersect.

        Show
        mikemccand Michael McCandless added a comment - Patch w/ test (thank you!) and fix. This is unfortunately a confusing expert API; other terms dicts were checking that the provided compiled automaton is NORMAL and throwing a clearer exception if not, so I carried that same check over to the default terms dict. I also added a note to the javadocs for Terms.intersect .
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit fcccd317ddb44a742a0b3265fcf32923649f38cd in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fcccd31 ]

        LUCENE-7576: detect when special case automaton is passed to Terms.intersect

        Show
        jira-bot ASF subversion and git services added a comment - Commit fcccd317ddb44a742a0b3265fcf32923649f38cd in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fcccd31 ] LUCENE-7576 : detect when special case automaton is passed to Terms.intersect
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit b6072f3ae539a5fc45a2bb9f99441dfeef4e440a in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b6072f3 ]

        LUCENE-7576: detect when special case automaton is passed to Terms.intersect

        Show
        jira-bot ASF subversion and git services added a comment - Commit b6072f3ae539a5fc45a2bb9f99441dfeef4e440a in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b6072f3 ] LUCENE-7576 : detect when special case automaton is passed to Terms.intersect
        Hide
        mikemccand Michael McCandless added a comment -

        Thank you Tom Mortimer.

        Show
        mikemccand Michael McCandless added a comment - Thank you Tom Mortimer .
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit a195a9868a7f7b57c56b3b8b6b8c9ada36109144 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a195a98 ]

        LUCENE-7576: fix other codecs to detect when special case automaton is passed to Terms.intersect

        Show
        jira-bot ASF subversion and git services added a comment - Commit a195a9868a7f7b57c56b3b8b6b8c9ada36109144 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a195a98 ] LUCENE-7576 : fix other codecs to detect when special case automaton is passed to Terms.intersect
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 8cbcbc9d956754de1fab2c626705aa6d6ab9f910 in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8cbcbc9 ]

        LUCENE-7576: fix other codecs to detect when special case automaton is passed to Terms.intersect

        Show
        jira-bot ASF subversion and git services added a comment - Commit 8cbcbc9d956754de1fab2c626705aa6d6ab9f910 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8cbcbc9 ] LUCENE-7576 : fix other codecs to detect when special case automaton is passed to Terms.intersect
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit fcccd317ddb44a742a0b3265fcf32923649f38cd in lucene-solr's branch refs/heads/apiv2 from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fcccd31 ]

        LUCENE-7576: detect when special case automaton is passed to Terms.intersect

        Show
        jira-bot ASF subversion and git services added a comment - Commit fcccd317ddb44a742a0b3265fcf32923649f38cd in lucene-solr's branch refs/heads/apiv2 from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fcccd31 ] LUCENE-7576 : detect when special case automaton is passed to Terms.intersect
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 8cbcbc9d956754de1fab2c626705aa6d6ab9f910 in lucene-solr's branch refs/heads/apiv2 from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8cbcbc9 ]

        LUCENE-7576: fix other codecs to detect when special case automaton is passed to Terms.intersect

        Show
        jira-bot ASF subversion and git services added a comment - Commit 8cbcbc9d956754de1fab2c626705aa6d6ab9f910 in lucene-solr's branch refs/heads/apiv2 from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8cbcbc9 ] LUCENE-7576 : fix other codecs to detect when special case automaton is passed to Terms.intersect
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit ebb5c7e6768c03c83be4aa3abdab22e16cb67c2c in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ebb5c7e ]

        LUCENE-7576: AutomatonTermsEnum ctor should also insist on a NORMAL CompiledAutomaton in

        Show
        jira-bot ASF subversion and git services added a comment - Commit ebb5c7e6768c03c83be4aa3abdab22e16cb67c2c in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ebb5c7e ] LUCENE-7576 : AutomatonTermsEnum ctor should also insist on a NORMAL CompiledAutomaton in
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 8e974ecdcfc85243442fadf353cab4cb52a6cab2 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8e974ec ]

        LUCENE-7576: AutomatonTermsEnum ctor should also insist on a NORMAL CompiledAutomaton in

        Show
        jira-bot ASF subversion and git services added a comment - Commit 8e974ecdcfc85243442fadf353cab4cb52a6cab2 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8e974ec ] LUCENE-7576 : AutomatonTermsEnum ctor should also insist on a NORMAL CompiledAutomaton in

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            TomMortimer Tom Mortimer
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development