Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8902

Index-time join ToParentBlockJoinQuery query produces incorrect result with child wildcards

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 8.1.1
    • None
    • modules/join
    • None
    • New

    Description

      When I do a index-time join query on certain parent docs with a wildcard query for child docs, sometimes I get the wrong answer. Example:

       

      Parent Doc Children
      id=id00000       none
      id=id00001
      1. program=P1
      id=id00002
      1. program=P1
      2. program=P2
      id=id00003       none
      id=id00004
      1. program=P1
      id=id00005
      1. program=P1
      2. program=P2

      So essentially I have 6 parent docs, doc 0 has no children, doc 1 has 1 child, doc 2 has 2 children, etc.

      1. The following query gives the correct results:

              BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00002", "id00003", "id00004", "id00005")));
              Query q = new ToParentBlockJoinQuery(new TermInSetQuery("program", toSet("P1", "P2")), parentSet, ScoreMode.None);

      Returns the correct result (4 docs: ["id00001", "id00002", "id00004", "id00005"]

       

      2. This also gives correct result (same as above):

              BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00002", "id00003", "id00004", "id00005")));
              Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, ScoreMode.None);

       

      3. Also correct (same as above)

              BitSetProducer parentSet = new QueryBitSetProducer(new WildcardQuery(new Term("id", "*")));
              Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, ScoreMode.None);

      so far so good.

       

      4. This one gives incorrect result:

              BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00003")));
              Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);

      Returns 2 docs ["id00001", "id00003"]. It should only return "id00001" and not "id00003" here. Very strange behavior.

       

      5. Just asking for "id00003" also incorrectly returns it:

              BitSetProducer parentSet = new QueryBitSetProducer(new TermQuery(new Term("id", "id00003")));
              Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);

       

      6. But as soon as I add "id00002" to the parent query, it works again..

              BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet( "id00003", "id00002")));
              Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);

      Gives the correct result ["id00002"]


      I am attaching the unit test that demonstrates this: https://pastebin.com/aJ1LDLCS

      I don't know if I am doing something wrong, or if there is an issue.

      Thank you for looking into it.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Solodin Andrei
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 10m
                10m