Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8902

Index-time join ToParentBlockJoinQuery query produces incorrect result with child wildcards

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 8.1.1
    • Fix Version/s: None
    • Component/s: modules/join
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When I do a index-time join query on certain parent docs with a wildcard query for child docs, sometimes I get the wrong answer. Example:

       

      Parent Doc Children
      id=id00000       none
      id=id00001
      1. program=P1
      id=id00002
      1. program=P1
      2. program=P2
      id=id00003       none
      id=id00004
      1. program=P1
      id=id00005
      1. program=P1
      2. program=P2

      So essentially I have 6 parent docs, doc 0 has no children, doc 1 has 1 child, doc 2 has 2 children, etc.

      1. The following query gives the correct results:

              BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00002", "id00003", "id00004", "id00005")));
              Query q = new ToParentBlockJoinQuery(new TermInSetQuery("program", toSet("P1", "P2")), parentSet, ScoreMode.None);

      Returns the correct result (4 docs: ["id00001", "id00002", "id00004", "id00005"]

       

      2. This also gives correct result (same as above):

              BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00002", "id00003", "id00004", "id00005")));
              Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, ScoreMode.None);

       

      3. Also correct (same as above)

              BitSetProducer parentSet = new QueryBitSetProducer(new WildcardQuery(new Term("id", "*")));
              Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, ScoreMode.None);

      so far so good.

       

      4. This one gives incorrect result:

              BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00003")));
              Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);

      Returns 2 docs ["id00001", "id00003"]. It should only return "id00001" and not "id00003" here. Very strange behavior.

       

      5. Just asking for "id00003" also incorrectly returns it:

              BitSetProducer parentSet = new QueryBitSetProducer(new TermQuery(new Term("id", "id00003")));
              Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);

       

      6. But as soon as I add "id00002" to the parent query, it works again..

              BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet( "id00003", "id00002")));
              Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);

      Gives the correct result ["id00002"]


      I am attaching the unit test that demonstrates this: https://pastebin.com/aJ1LDLCS

      I don't know if I am doing something wrong, or if there is an issue.

      Thank you for looking into it.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Solodin Andrei
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 10m
                10m