Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10583

Add 'join' as a new type of domain change in JSON Facets

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.6, 7.0
    • Component/s: None
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      Add support for a new (query) join option when specifying a domain for a JSON Facet.

      Suggested syntax...

      ...
      domain : { join : { from : field_foo,
               	      to : field_bar
      		  }
      	 }
      
      1. SOLR-10583.patch
        52 kB
        Hoss Man
      2. SOLR-10583.patch
        66 kB
        Hoss Man
      3. SOLR-10583.patch
        63 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          hossman Hoss Man added a comment -

          A while back someone asked me about the "domain change" idea in the JSON Facet code. They had seen some blogs from yonik about use blockParent and blockChildren domain changes to facet on parent/child fields, and were wondering if the same sort of thing could work for a query join.

          I wasn't too familiar with any of this code, but poking around it seemed fairly straight forward. (Except that my randomized test case ran into an interesting bug when the same join query was used in two diff facets: I hacked around this by forcing a non-caching wrapper around the queries, but still haven't gotten to the root cause)

          The attached patch includes:

          • complete code implementing this new domain change option
          • Modifications to TestJsonFacets to test the new join option w/similar data as the existing blockChildren and blockParent tests
          • a fairly robust TestCloudJSONFacetJoinDomain that does randomized facet queries w/these join (sometimes + filter) domain changes and then verifies the result counts against test queries w/equivilent fq params
            • currently only uses one shard...
            • ...and always requests a facet limit greater then the cardinality of the fields so that refinement is not an issue even if/when the test starts using multiple shards
          • a very similar but greatly simplified TestCloudJSONFacet that does similar checks but only using filter domain changes
            • i hacked this off of TestCloudJSONFacetJoinDomain when i first started getting weird test failures to try see if i could reproduce any weird failures with just the existing domain change code

          Still TODO (all pretty well captured in nocommits)...

          1. get to the bottom of the cache issue
            • need to do this first before improving the test to ensure nothing else "breaks" our garunteed failure in testBespoke
            • see nocommits in the code for ideas about possible root causes
          2. generate better errors if the join is malformed (ie: not a map, doesn't have both from and to specified
            • or maybe: support the syntax join : "field_name" as syntax sugar for...
              join : { from : "field_name", to : "field_name" }
              
          3. add code+tests to ensure joining on numeric fields works
          4. fix the multipleValuesPerDocument logic and enhance the test so some fields are always single valued
          5. cleanup & refactor some of the code that uses JoinUtil
            • is it as efficient as it can be?
            • is there stuff we can share with ScoreJoinQParserPlugin?
          6. refactor the TestCloudJSONFacet & TestCloudJSONFacetJoinDomain duplication and make them use multi-shards
            • FWIW: join queries might seem like an odd thing to worry about testing with multiple shards – but the usecases I have in mind can all leverage doc routing to ensure that all docs with identical values in the join field are co-located
          7. investigate the recent work/tests yonik's been doing on supporting refinement (single level?), and add join to those tests as well.
          Show
          hossman Hoss Man added a comment - A while back someone asked me about the "domain change" idea in the JSON Facet code. They had seen some blogs from yonik about use blockParent and blockChildren domain changes to facet on parent/child fields, and were wondering if the same sort of thing could work for a query join . I wasn't too familiar with any of this code, but poking around it seemed fairly straight forward. (Except that my randomized test case ran into an interesting bug when the same join query was used in two diff facets: I hacked around this by forcing a non-caching wrapper around the queries, but still haven't gotten to the root cause) The attached patch includes: complete code implementing this new domain change option Modifications to TestJsonFacets to test the new join option w/similar data as the existing blockChildren and blockParent tests a fairly robust TestCloudJSONFacetJoinDomain that does randomized facet queries w/these join (sometimes + filter ) domain changes and then verifies the result counts against test queries w/equivilent fq params currently only uses one shard... ...and always requests a facet limit greater then the cardinality of the fields so that refinement is not an issue even if/when the test starts using multiple shards a very similar but greatly simplified TestCloudJSONFacet that does similar checks but only using filter domain changes i hacked this off of TestCloudJSONFacetJoinDomain when i first started getting weird test failures to try see if i could reproduce any weird failures with just the existing domain change code Still TODO (all pretty well captured in nocommits)... get to the bottom of the cache issue need to do this first before improving the test to ensure nothing else "breaks" our garunteed failure in testBespoke see nocommits in the code for ideas about possible root causes generate better errors if the join is malformed (ie: not a map, doesn't have both from and to specified or maybe: support the syntax join : "field_name" as syntax sugar for... join : { from : "field_name" , to : "field_name" } add code+tests to ensure joining on numeric fields works fix the multipleValuesPerDocument logic and enhance the test so some fields are always single valued cleanup & refactor some of the code that uses JoinUtil is it as efficient as it can be? is there stuff we can share with ScoreJoinQParserPlugin? refactor the TestCloudJSONFacet & TestCloudJSONFacetJoinDomain duplication and make them use multi-shards FWIW: join queries might seem like an odd thing to worry about testing with multiple shards – but the usecases I have in mind can all leverage doc routing to ensure that all docs with identical values in the join field are co-located investigate the recent work/tests yonik's been doing on supporting refinement (single level?), and add join to those tests as well.
          Hide
          hossman Hoss Man added a comment -

          get to the bottom of the cache issue

          I tracked this down to LUCENE-7810.

          As i noted in that issue...

          1. At present, due to some code I don't really understand in how Solr only leverages JoinUtils in rewritten queries, it appears that this bug does not impact current Solr usecases. ...
          2. i discovered this bug purely by fluke because in my originally POC code for SOLR-10583 I used JoinUtils.createJoinQuery(...) directly instead of refactoring Solr's JoinQParserPlugin code so i could re-use it – doing that refactoring is my nextstep for that issue ...

          ...so once i refactor this patch to use the existing code in JoinQParser, this bug shouldn't impact this patch (fingers crossed)

          Show
          hossman Hoss Man added a comment - get to the bottom of the cache issue I tracked this down to LUCENE-7810 . As i noted in that issue... At present, due to some code I don't really understand in how Solr only leverages JoinUtils in rewritten queries, it appears that this bug does not impact current Solr usecases. ... i discovered this bug purely by fluke because in my originally POC code for SOLR-10583 I used JoinUtils.createJoinQuery(...) directly instead of refactoring Solr's JoinQParserPlugin code so i could re-use it – doing that refactoring is my nextstep for that issue ... ...so once i refactor this patch to use the existing code in JoinQParser, this bug shouldn't impact this patch (fingers crossed)
          Hide
          hossman Hoss Man added a comment -

          updated patch...

          • reuse code in JoinQParserPlugin
            • this resolved the caching problem
          • added tests for numeric fields
          • cleaned up parsing code
            • added tests for bad input

          I think the code is pretty solid – I beasted a few hundred test runs last night w/o any problems.

          Still todo (on my agenda for today):

          • look into the existing refinement tests and include 'join' there
          • refactor the duplicate code in the 2 tests i'm added
            • most likely i'm just going to delete TestCloudJSONFacet and make TestCloudJSONFacetJoinDomain occasionally use a domain w/'filter' but no 'join' since that's the only code path covered by TestCloudJSONFacet but not TestCloudJSONFacetJoinDomain
          • beast w/nightly to ensure the test constants aren't too high.
          Show
          hossman Hoss Man added a comment - updated patch... reuse code in JoinQParserPlugin this resolved the caching problem added tests for numeric fields cleaned up parsing code added tests for bad input I think the code is pretty solid – I beasted a few hundred test runs last night w/o any problems. Still todo (on my agenda for today): look into the existing refinement tests and include 'join' there refactor the duplicate code in the 2 tests i'm added most likely i'm just going to delete TestCloudJSONFacet and make TestCloudJSONFacetJoinDomain occasionally use a domain w/'filter' but no 'join' since that's the only code path covered by TestCloudJSONFacet but not TestCloudJSONFacetJoinDomain beast w/nightly to ensure the test constants aren't too high.
          Hide
          hossman Hoss Man added a comment -

          Updated patch ... I think this is pretty much good to go, but I'll keep beasting it overnight.

          Anyone have any comments/concerns?

          Show
          hossman Hoss Man added a comment - Updated patch ... I think this is pretty much good to go, but I'll keep beasting it overnight. Anyone have any comments/concerns?
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 15e1c5d34f69fa2662b5299dce6fc808854f8ba3 in lucene-solr's branch refs/heads/master from Chris Hostetter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=15e1c5d ]

          SOLR-10583: JSON Faceting now supports a query time 'join' domain change option

          Show
          jira-bot ASF subversion and git services added a comment - Commit 15e1c5d34f69fa2662b5299dce6fc808854f8ba3 in lucene-solr's branch refs/heads/master from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=15e1c5d ] SOLR-10583 : JSON Faceting now supports a query time 'join' domain change option
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 083051899794e509b1fbd7cf5fc475094c3b452a in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0830518 ]

          SOLR-10583: JSON Faceting now supports a query time 'join' domain change option

          (cherry picked from commit 15e1c5d34f69fa2662b5299dce6fc808854f8ba3)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 083051899794e509b1fbd7cf5fc475094c3b452a in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0830518 ] SOLR-10583 : JSON Faceting now supports a query time 'join' domain change option (cherry picked from commit 15e1c5d34f69fa2662b5299dce6fc808854f8ba3)
          Hide
          steve_rowe Steve Rowe added a comment -

          TestCloudJSONFacetJoinDomain, introduced on this issue, has a reproducible failure: SOLR-11016.

          Show
          steve_rowe Steve Rowe added a comment - TestCloudJSONFacetJoinDomain , introduced on this issue, has a reproducible failure: SOLR-11016 .

            People

            • Assignee:
              hossman Hoss Man
              Reporter:
              hossman Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development