Solr
  1. Solr
  2. SOLR-5285

Solr response format should support child Docs

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Solr has added support for taking childDocs as input ( only XML till now ). It's currently used for BlockJoinQuery.

      I feel that if a user indexes a document with child docs, even if he isn't using the BJQ features and is just searching which results in a hit on the parentDoc, it's childDocs should be returned in the response format.

      Chris Hostetter (Unused) on IRC suggested that the DocTransformers would be the place to add childDocs to the response.

      Now given a docId one needs to find out all the childDoc id's. A couple of approaches which I could think of are
      1. Maintain the relation between a parentDoc and it's childDocs during indexing time in maybe a separate index?
      2. Somehow emulate what happens in ToParentBlockJoinQuery.nextDoc() - Given a parentDoc it finds out all the childDocs but this requires a childScorer.

      Am I missing something obvious on how to find the relation between a parentDoc and it's childDocs because none of the above solutions for this look right.

      1. javabin_backcompat_child_docs.bin
        0.1 kB
        Varun Thacker
      2. SOLR-5285.patch
        50 kB
        Varun Thacker
      3. SOLR-5285.patch
        49 kB
        Varun Thacker
      4. SOLR-5285.patch
        51 kB
        Hoss Man
      5. SOLR-5285.patch
        52 kB
        Varun Thacker
      6. SOLR-5285.patch
        50 kB
        Hoss Man
      7. SOLR-5285.patch
        48 kB
        Varun Thacker
      8. SOLR-5285.patch
        36 kB
        Varun Thacker
      9. SOLR-5285.patch
        36 kB
        Varun Thacker
      10. SOLR-5285.patch
        27 kB
        Hoss Man
      11. SOLR-5285.patch
        26 kB
        Varun Thacker
      12. SOLR-5285.patch
        25 kB
        Varun Thacker
      13. SOLR-5285.patch
        25 kB
        Varun Thacker
      14. SOLR-5285.patch
        14 kB
        Varun Thacker
      15. SOLR-5285.patch
        8 kB
        Varun Thacker

        Issue Links

          Activity

          Hide
          Mikhail Khludnev added a comment -

          giving you search with !{parent qparser, which children you want to return, all children belong to a parent or only those which passed by the child level query?

          note: approach no 1. seems never fitting, because the relation is already present in the index; approach no.2 seems like over-optimization, if we are talking about enriching only 'rows' parent.

          Show
          Mikhail Khludnev added a comment - giving you search with !{parent qparser, which children you want to return, all children belong to a parent or only those which passed by the child level query? note: approach no 1. seems never fitting, because the relation is already present in the index; approach no.2 seems like over-optimization, if we are talking about enriching only 'rows' parent.
          Hide
          Varun Thacker added a comment -

          I don't think I have been to able to explain the problem accurately in the Jira description. Let me try again..

          1. Index docs with childDocs. So let's take the example on https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers

          2. Now I query http://localhost:8983/solr/collection1/select?q=*:*&fq=content_type:%22parentDocument%22

          3. This is the response

          <response>
              <lst name="responseHeader">
                  <int name="status">0</int>
                  <int name="QTime">2</int>
                  <lst name="params">
                      <str name="q">*:*</str>
                      <str name="fq">content_type:"parentDocument"</str>
                  </lst>
              </lst>
          <result name="response" numFound="2" start="0">
              <doc>
                  <str name="id">1</str>
                  <arr name="title">
                  <str>Solr adds block join support</str>
                  </arr>
                  <arr name="content_type">
                      <str>parentDocument</str>
                  </arr>
                  <long name="_version_">1447311301175934976</long>
              </doc>
              <doc>
                  <str name="id">3</str>
                  <arr name="title">
                      <str>Lucene and Solr 4.5 is out</str>
                  </arr>
                  <arr name="content_type">
                      <str>parentDocument</str>
                  </arr>
                  <long name="_version_">1447311327665061888</long>
              </doc>
          </result>
          </response>
          

          4. Ideally I would want this as my response:

          <response>
              <lst name="responseHeader">
                  <int name="status">0</int>
                  <int name="QTime">2</int>
                  <lst name="params">
                      <str name="q">*:*</str>
                  </lst>
              </lst>
          <result name="response" numFound="4" start="0">
          <doc>
              <str name="id">1</str>
              <arr name="title">
                  <str>Solr adds block join support</str>
              </arr>
              <arr name="content_type">
                  <str>parentDocument</str>
              </arr>
              <long name="_version_">1447311301175934976</long>
              <doc>
                  <str name="id">2</str>
                  <str name="comments">SolrCloud supports it too!</str>
              </doc>
          </doc>
          
          <doc>
              <str name="id">3</str>
              <arr name="title">
                  <str>Lucene and Solr 4.5 is out</str>
              </arr>
              <arr name="content_type">
                  <str>parentDocument</str>
              </arr>
              <long name="_version_">1447311327665061888</long>
              <doc>
                  <str name="id">4</str>
                  <str name="comments">Lots of new features</str>
              </doc>
          </doc>
          </result>
          </response>
          

          Also Mikhail Khludnev ,

          approach no 1. seems never fitting, because the relation is already present in the index

          how is the relation already present in the index? From what I understand each child doc has a root field which points to the parent doc. But we do not have the opposite relation. To implement this feature we would need the relation b/w each parent and all it's child doc right? Am I missing anything?

          Show
          Varun Thacker added a comment - I don't think I have been to able to explain the problem accurately in the Jira description. Let me try again.. 1. Index docs with childDocs. So let's take the example on https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers 2. Now I query http://localhost:8983/solr/collection1/select?q=*:*&fq=content_type:%22parentDocument%22 3. This is the response <response> <lst name= "responseHeader" > < int name= "status" >0</ int > < int name= "QTime" >2</ int > <lst name= "params" > <str name= "q" >*:*</str> <str name= "fq" >content_type: "parentDocument" </str> </lst> </lst> <result name= "response" numFound= "2" start= "0" > <doc> <str name= "id" >1</str> <arr name= "title" > <str>Solr adds block join support</str> </arr> <arr name= "content_type" > <str>parentDocument</str> </arr> < long name= "_version_" >1447311301175934976</ long > </doc> <doc> <str name= "id" >3</str> <arr name= "title" > <str>Lucene and Solr 4.5 is out</str> </arr> <arr name= "content_type" > <str>parentDocument</str> </arr> < long name= "_version_" >1447311327665061888</ long > </doc> </result> </response> 4. Ideally I would want this as my response: <response> <lst name= "responseHeader" > < int name= "status" >0</ int > < int name= "QTime" >2</ int > <lst name= "params" > <str name= "q" >*:*</str> </lst> </lst> <result name= "response" numFound= "4" start= "0" > <doc> <str name= "id" >1</str> <arr name= "title" > <str>Solr adds block join support</str> </arr> <arr name= "content_type" > <str>parentDocument</str> </arr> < long name= "_version_" >1447311301175934976</ long > <doc> <str name= "id" >2</str> <str name= "comments" >SolrCloud supports it too!</str> </doc> </doc> <doc> <str name= "id" >3</str> <arr name= "title" > <str>Lucene and Solr 4.5 is out</str> </arr> <arr name= "content_type" > <str>parentDocument</str> </arr> < long name= "_version_" >1447311327665061888</ long > <doc> <str name= "id" >4</str> <str name= "comments" >Lots of new features</str> </doc> </doc> </result> </response> Also Mikhail Khludnev , approach no 1. seems never fitting, because the relation is already present in the index how is the relation already present in the index? From what I understand each child doc has a root field which points to the parent doc. But we do not have the opposite relation. To implement this feature we would need the relation b/w each parent and all it's child doc right? Am I missing anything?
          Hide
          Mikhail Khludnev added a comment -

          Hello Varun,

          You can experiment with the following code:

          @Override
          public void transform(SolrDocument doc, int docid) throws IOException{
            parentid = doc.getField("id");
            children = searcher.getDocList(
                QParser.getParser("{!child of=content_type:parentDocument}id:"+parentid).getQuery(),...);
            doc.put("children", children);
          }
          
          
          Show
          Mikhail Khludnev added a comment - Hello Varun, You can experiment with the following code: @Override public void transform(SolrDocument doc, int docid) throws IOException{ parentid = doc.getField( "id" ); children = searcher.getDocList( QParser.getParser( "{!child of=content_type:parentDocument}id:" +parentid).getQuery(),...); doc.put( "children" , children); }
          Hide
          Varun Thacker added a comment -

          Syntax: [children parentFilter="content_type:parentDocument"]

          This patch adds support for nested docs for wt=xml. I will add for other writers soon.

          There is no way to specify return fields for a childDoc. Should there be an option?

          I added ChildDocTransformerFactory#toSolrDocument which is the same as TextResponseWriter#toSolrDocument. It's not the best thing to do.

          Mikhail Khludnev thanks for your suggestion

          Show
          Varun Thacker added a comment - Syntax: [children parentFilter="content_type:parentDocument"] This patch adds support for nested docs for wt=xml. I will add for other writers soon. There is no way to specify return fields for a childDoc. Should there be an option? I added ChildDocTransformerFactory#toSolrDocument which is the same as TextResponseWriter#toSolrDocument. It's not the best thing to do. Mikhail Khludnev thanks for your suggestion
          Hide
          Varun Thacker added a comment -
          • Nested chidDoc support in both XML and JSON response formats
          • It's implemented as a DocTransformer and the syntax for calling using it is:
            [child parentFilter="fieldName:fieldValue"]
            
          • Added response format tests for XML And JSON
          Show
          Varun Thacker added a comment - Nested chidDoc support in both XML and JSON response formats It's implemented as a DocTransformer and the syntax for calling using it is: [child parentFilter="fieldName:fieldValue"] Added response format tests for XML And JSON
          Hide
          Hoss Man added a comment -

          Hey Varun, thanks for working on this – great idea.

          I haven't reviewed your implementation details in depth, but here's my comments/concerns/questions based on an initial skim of your patch...

          • "There is no way to specify return fields for a childDoc. Should there be an option?"
            • I think so, probably an "fl" option on the [child] transformer, but if that's non-trivial to add right now, we should at least clearly document/test that all of the child fields come back – ie: make sure it's not inadvertently affected by the top level 'fl' param)
          • "I added ChildDocTransformerFactory#toSolrDocument which is the same as TextResponseWriter#toSolrDocument. It's not the best thing to do."
            • no .. we should just refactor this into a static utility method somewhere.
          • it's not clear to me from the API what to expect will happen if i have more then one level of parent-child relationships in my index – will children & grandchildren be returned? whatever is expected needs to be documented/tested
          • This should ideally work (and be tested) when using SolrJ and JavaBin – but at a minimum we must ensure it doesn't cause requests to explode with weird errors – particularly since i notice your patch modifies o.a.s.common.SolrDocument but i don't see any updates to the javabin codec.
            • we should also ensure/test that the codec is updated in such a way that people who don't use this feature won't get weird errors when using old_solrj+new_solr or new_solrj+old_solr (ie: don't include an empty list of children in the binary data if the transformer isn't used at all)
          • the way you create the index in TestChildDocTransformer is pretty confusing and seems overly convoluted – it would be a lot more straight forward to use real, nested, SolrInputDocuments and something like SolrTestCaseJ4.addAndGetVersion
          • in ChildDocTransformer.transform, the use of String.format & "{!child..}" seems like unneccessary overhead – we should just use new ToChildBlockJoinQuery(...)
            • no need to build up a string just to parse it, and as things stand now i'm pretty sure you'll get weird errors if parentFilter contains quotes
          • It looks like there is an NPE waiting to happen in SolrDocument.clear()
          Show
          Hoss Man added a comment - Hey Varun, thanks for working on this – great idea. I haven't reviewed your implementation details in depth, but here's my comments/concerns/questions based on an initial skim of your patch... "There is no way to specify return fields for a childDoc. Should there be an option?" I think so, probably an "fl" option on the [child] transformer, but if that's non-trivial to add right now, we should at least clearly document/test that all of the child fields come back – ie: make sure it's not inadvertently affected by the top level 'fl' param) "I added ChildDocTransformerFactory#toSolrDocument which is the same as TextResponseWriter#toSolrDocument. It's not the best thing to do." no .. we should just refactor this into a static utility method somewhere. it's not clear to me from the API what to expect will happen if i have more then one level of parent-child relationships in my index – will children & grandchildren be returned? whatever is expected needs to be documented/tested This should ideally work (and be tested) when using SolrJ and JavaBin – but at a minimum we must ensure it doesn't cause requests to explode with weird errors – particularly since i notice your patch modifies o.a.s.common.SolrDocument but i don't see any updates to the javabin codec. we should also ensure/test that the codec is updated in such a way that people who don't use this feature won't get weird errors when using old_solrj+new_solr or new_solrj+old_solr (ie: don't include an empty list of children in the binary data if the transformer isn't used at all) the way you create the index in TestChildDocTransformer is pretty confusing and seems overly convoluted – it would be a lot more straight forward to use real, nested, SolrInputDocuments and something like SolrTestCaseJ4.addAndGetVersion in ChildDocTransformer.transform, the use of String.format & "{!child..}" seems like unneccessary overhead – we should just use new ToChildBlockJoinQuery(...) no need to build up a string just to parse it, and as things stand now i'm pretty sure you'll get weird errors if parentFilter contains quotes It looks like there is an NPE waiting to happen in SolrDocument.clear()
          Hide
          Varun Thacker added a comment -

          1. After having a quick glance at the code I don't think it's trivial to add return fields for child docs. I could create a follow up Jira for it. But I added more tests to verify the the top level fl param does not affect the fields coming back in the child document.

          2. Moved TextResponseWriter.toSolrDocument to ResponseWriterUtil.toSolrDocument

          3. I tested with adding grandchildren document, and the behaviour is that the grandchildren docs gets returned along with the child docs by ToChildBlockJoinQuery. I'll go through ToChildBlockJoinQuery to see if thats the expected behaviour.

          4. Updated JavaBinCodec to handle child documents. Added a test TestChildDocTransformer.testJavaBinCodec. Also TestJavaBinCodec passes so it should have any backward or forward compatibility issues.

          5. Simplified TestChildDocTransformer.createIndex

          6. Fixed ChildDocTransformer.transform to directly create ToChildBlockJoinQuery

          7. Added null check in SolrDocument.clear()

          Show
          Varun Thacker added a comment - 1. After having a quick glance at the code I don't think it's trivial to add return fields for child docs. I could create a follow up Jira for it. But I added more tests to verify the the top level fl param does not affect the fields coming back in the child document. 2. Moved TextResponseWriter.toSolrDocument to ResponseWriterUtil.toSolrDocument 3. I tested with adding grandchildren document, and the behaviour is that the grandchildren docs gets returned along with the child docs by ToChildBlockJoinQuery. I'll go through ToChildBlockJoinQuery to see if thats the expected behaviour. 4. Updated JavaBinCodec to handle child documents. Added a test TestChildDocTransformer.testJavaBinCodec. Also TestJavaBinCodec passes so it should have any backward or forward compatibility issues. 5. Simplified TestChildDocTransformer.createIndex 6. Fixed ChildDocTransformer.transform to directly create ToChildBlockJoinQuery 7. Added null check in SolrDocument.clear()
          Hide
          Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          Uwe Schindler added a comment - Move issue to Solr 4.9.
          Hide
          Varun Thacker added a comment -

          Updated patch with trunk.

          it's not clear to me from the API what to expect will happen if i have more then one level of parent-child relationships in my index – will children & grandchildren be returned? whatever is expected needs to be documented/tested

          Tested with grandchildren. In Lucene all grandchildren and all siblings are treated as simply children to the parent document. A parent document and all it's child documents are indexed in a block. Hence we should document for only support one level of nesting.

          Show
          Varun Thacker added a comment - Updated patch with trunk. it's not clear to me from the API what to expect will happen if i have more then one level of parent-child relationships in my index – will children & grandchildren be returned? whatever is expected needs to be documented/tested Tested with grandchildren. In Lucene all grandchildren and all siblings are treated as simply children to the parent document. A parent document and all it's child documents are indexed in a block. Hence we should document for only support one level of nesting.
          Hide
          Varun Thacker added a comment -

          Correct patch. Please ignore the previous patch.

          Show
          Varun Thacker added a comment - Correct patch. Please ignore the previous patch.
          Hide
          Hoss Man added a comment -

          Hey Varun, I made some small changes to your patch as i was reviewing it...

          • removed unneeded new import of SolrCmdDistributor in TextResponseWriter
          • TextResponseWriter.toSolrDocument is public, so we can't just remove it competley – i added it back as a stub wrapper arround your new ResponseWriterUtil.toSolrDocument
          • add some javadocs to ResponseWriterUtil.toSolrDocument
          • I think I missed this in your last patch: we can't change the signature of the public static String SolrTestCaseJ4.json(SolrInputDocument) method – i fixed this to be a refactoring that delegates to a new public static void SolrTestCaseJ4.json(SolrInputDocument,CharArr) method (which is recursive like in your patch)
          • Moved your new TestChildDocTransformer.testJavaBinCodec method into TestJavaBinCodec.testResponseChildDocuments since it doesn't actually test anything about the transformer. just the codec. I also beefed it up with some additional children, and assertions about hasChildDocuments() and if/when getChildDocuments() should be null.

          Some responses to your previous comments...

          Fixed ChildDocTransformer.transform to directly create ToChildBlockJoinQuery

          Added null check in SolrDocument.clear()

          +1, +1 ... those look great

          ... I don't think it's trivial to add return fields for child docs. I could create a follow up Jira for it. But I added more tests to verify the the top level fl param does not affect the fields coming back in the child document.

          I think the way to do it would be to have ChildDocTransformer maintain it's own private ReturnField instace (based on an 'fl' local param) which it would use in the transform(...) method to remove fields from the child {{SolrDocument}}s if they weren't wanted.

          But you're right – that can be done as a later feature enhancement. (I added a TODO for now)

          TestJavaBinCodec passes so it should have any backward or forward compatibility issues.

          (I realize you were the main author of SOLR-5265 and you know a lot of this, but i'm going to go into depth for clarity for anyone else reading this issue)

          Most of the existing tests using the codec don't give any assurances that the patch hasn't broken codec compatibility – because the tests run with the current marshall and unmarshal methods, both of which the patch changes. The only tests thats help assure the patch doesn't break backcompatibility are TestJavaBinCodec.testBackCompat and TestJavaBinCodec.testForwardCompat because they compare the results of the current codec with a binary file produced with an older version of the codec stored in svn. They ensure that the current codec (in a client) can still read the binary data from an older codec (on a server), and that the current codec (on a server) still produces an identical response to what an older codec would (so we know that a client running an older version should be able to parse it). But these tests only verify this for documents w/o any children (because that's all that existed prior to this patch). So they don't give us any future protection against breaking backcompat with the new additions of child docs in the response.

          We need to add new tests, using a new *.bin file in SVN containing nested SolrDocuments, to help verify forward compatibility (and future backcompatibility).

          (as a general rule we shouldn't modify "javabin_backcompat.bin", or any other *.bin files we add over time, because that would make it very easy to some backcompat breakages to slip in with whatever commit modifies them)

          Simplified TestChildDocTransformer.createIndex

          Thanks – with the simplified and cleaner index creation, it's a lot easier to see what's going on. One key issue that jumps out at me is that there's no verification that things work properly if you use the transformer on a result set where some docs have children and others do not. we should definitely account for that.

          I think ideally what we should have (in addition to your existing static test) is a new randomized test that builds up a Map (keyed on id) of {{SolrInputDocument}}s with a random number of children (etc... up to a random depth) then after adding all of those documents executes some random queries using this transformer and asserts that each {{SolrDocument}}s in the response has the expected children same number of children as the original docs.

          (The best place for this to live would be SolrExampleTests because then it would help vet both the XML and binary response formats, using the actual example configs)

          Tested with grandchildren. In Lucene all grandchildren and all siblings are treated as simply children to the parent document. A parent document and all it's child documents are indexed in a block. Hence we should document for only support one level of nesting.

          Well, looking at your test, a more specific way to put it is that the new "child" transformer actually returns all descendents of the return documents in a flat list. Which is fine if we document it that way – but it has me thinking: we should really add a "childFilter" option to the transformer to constrain the results. This would not only help with the grand child situation, but would also make it easy for people to constrain the types of children they want to get back. (and getDocList can already take in a Query filter)


          Some new misc questions/comments about the current patch...

          • Why is the tag name in the JSON format "childDocs" but in the XML format it's "childDoc" (no plural) ? ... seems like those should be consistent.
          • the "10" hardcoded in the getDocList call is garunteed to burn someone ... it can definitely default to 10, but we need to have a local param for it in the tnrasformer
          Show
          Hoss Man added a comment - Hey Varun, I made some small changes to your patch as i was reviewing it... removed unneeded new import of SolrCmdDistributor in TextResponseWriter TextResponseWriter.toSolrDocument is public, so we can't just remove it competley – i added it back as a stub wrapper arround your new ResponseWriterUtil.toSolrDocument add some javadocs to ResponseWriterUtil.toSolrDocument I think I missed this in your last patch: we can't change the signature of the public static String SolrTestCaseJ4.json(SolrInputDocument) method – i fixed this to be a refactoring that delegates to a new public static void SolrTestCaseJ4.json(SolrInputDocument,CharArr) method (which is recursive like in your patch) Moved your new TestChildDocTransformer.testJavaBinCodec method into TestJavaBinCodec.testResponseChildDocuments since it doesn't actually test anything about the transformer. just the codec. I also beefed it up with some additional children, and assertions about hasChildDocuments() and if/when getChildDocuments() should be null. Some responses to your previous comments... Fixed ChildDocTransformer.transform to directly create ToChildBlockJoinQuery Added null check in SolrDocument.clear() +1, +1 ... those look great ... I don't think it's trivial to add return fields for child docs. I could create a follow up Jira for it. But I added more tests to verify the the top level fl param does not affect the fields coming back in the child document. I think the way to do it would be to have ChildDocTransformer maintain it's own private ReturnField instace (based on an 'fl' local param) which it would use in the transform(...) method to remove fields from the child {{SolrDocument}}s if they weren't wanted. But you're right – that can be done as a later feature enhancement. (I added a TODO for now) TestJavaBinCodec passes so it should have any backward or forward compatibility issues. (I realize you were the main author of SOLR-5265 and you know a lot of this, but i'm going to go into depth for clarity for anyone else reading this issue) Most of the existing tests using the codec don't give any assurances that the patch hasn't broken codec compatibility – because the tests run with the current marshall and unmarshal methods, both of which the patch changes. The only tests thats help assure the patch doesn't break backcompatibility are TestJavaBinCodec.testBackCompat and TestJavaBinCodec.testForwardCompat because they compare the results of the current codec with a binary file produced with an older version of the codec stored in svn. They ensure that the current codec (in a client) can still read the binary data from an older codec (on a server), and that the current codec (on a server) still produces an identical response to what an older codec would (so we know that a client running an older version should be able to parse it). But these tests only verify this for documents w/o any children (because that's all that existed prior to this patch). So they don't give us any future protection against breaking backcompat with the new additions of child docs in the response. We need to add new tests, using a new *.bin file in SVN containing nested SolrDocuments, to help verify forward compatibility (and future backcompatibility). (as a general rule we shouldn't modify "javabin_backcompat.bin", or any other *.bin files we add over time, because that would make it very easy to some backcompat breakages to slip in with whatever commit modifies them) Simplified TestChildDocTransformer.createIndex Thanks – with the simplified and cleaner index creation, it's a lot easier to see what's going on. One key issue that jumps out at me is that there's no verification that things work properly if you use the transformer on a result set where some docs have children and others do not. we should definitely account for that. I think ideally what we should have (in addition to your existing static test) is a new randomized test that builds up a Map (keyed on id) of {{SolrInputDocument}}s with a random number of children (etc... up to a random depth) then after adding all of those documents executes some random queries using this transformer and asserts that each {{SolrDocument}}s in the response has the expected children same number of children as the original docs. (The best place for this to live would be SolrExampleTests because then it would help vet both the XML and binary response formats, using the actual example configs) Tested with grandchildren. In Lucene all grandchildren and all siblings are treated as simply children to the parent document. A parent document and all it's child documents are indexed in a block. Hence we should document for only support one level of nesting. Well, looking at your test, a more specific way to put it is that the new "child" transformer actually returns all descendents of the return documents in a flat list. Which is fine if we document it that way – but it has me thinking: we should really add a "childFilter" option to the transformer to constrain the results. This would not only help with the grand child situation, but would also make it easy for people to constrain the types of children they want to get back. (and getDocList can already take in a Query filter) Some new misc questions/comments about the current patch... Why is the tag name in the JSON format "childDocs" but in the XML format it's "childDoc" (no plural) ? ... seems like those should be consistent. the "10" hardcoded in the getDocList call is garunteed to burn someone ... it can definitely default to 10, but we need to have a local param for it in the tnrasformer
          Hide
          Varun Thacker added a comment -

          I think we should fix the back compat tests before resolving this.

          Show
          Varun Thacker added a comment - I think we should fix the back compat tests before resolving this.
          Hide
          Varun Thacker added a comment -

          1. Added JavaDocs to ChildDocTransformerFactory
          2. Created a new binary file for backcompatibility and forwardcompatibility.

          Why is the tag name in the JSON format "childDocs" but in the XML format it's "childDoc" (no plural) ? ... seems like those should be consistent.

          I guess because in JSON the input is a JSON array hence "childDocs", while in XML we use multiple "childDoc" tags to represent nested documents.

          the "10" hardcoded in the getDocList call is garunteed to burn someone ... it can definitely default to 10, but we need to have a local param for it in the tnrasformer

          Added a non mandatory parameter called "numChildDocs" which makes it configurable. Although I'm not sure if the name is correct.

          Well, looking at your test, a more specific way to put it is that the new "child" transformer actually returns all descendents of the return documents in a flat list. Which is fine if we document it that way – but it has me thinking: we should really add a "childFilter" option to the transformer to constrain the results. This would not only help with the grand child situation, but would also make it easy for people to constrain the types of children they want to get back. (and getDocList can already take in a Query filter)

          Added a non mandatory parameter called "childFilter" which could be used to filter out which child documents to be nested in the parent documents to be returned.

          TODO - I will work on adding randomized testing

          Show
          Varun Thacker added a comment - 1. Added JavaDocs to ChildDocTransformerFactory 2. Created a new binary file for backcompatibility and forwardcompatibility. Why is the tag name in the JSON format "childDocs" but in the XML format it's "childDoc" (no plural) ? ... seems like those should be consistent. I guess because in JSON the input is a JSON array hence "childDocs", while in XML we use multiple "childDoc" tags to represent nested documents. the "10" hardcoded in the getDocList call is garunteed to burn someone ... it can definitely default to 10, but we need to have a local param for it in the tnrasformer Added a non mandatory parameter called "numChildDocs" which makes it configurable. Although I'm not sure if the name is correct. Well, looking at your test, a more specific way to put it is that the new "child" transformer actually returns all descendents of the return documents in a flat list. Which is fine if we document it that way – but it has me thinking: we should really add a "childFilter" option to the transformer to constrain the results. This would not only help with the grand child situation, but would also make it easy for people to constrain the types of children they want to get back. (and getDocList can already take in a Query filter) Added a non mandatory parameter called "childFilter" which could be used to filter out which child documents to be nested in the parent documents to be returned. TODO - I will work on adding randomized testing
          Hide
          Varun Thacker added a comment -

          Updated patch with the binary file is attached separately.

          Still need to add the randomized tests.

          Show
          Varun Thacker added a comment - Updated patch with the binary file is attached separately. Still need to add the randomized tests.
          Hide
          Raveendra Yerraguntl added a comment -

          Thanks Varun and all.

          Just in time. Is it possible to get into 4.8.* versions . It reduces lot of UI work.

          If not when is the 4.9 scheduled for a stable release?

          Show
          Raveendra Yerraguntl added a comment - Thanks Varun and all. Just in time. Is it possible to get into 4.8.* versions . It reduces lot of UI work. If not when is the 4.9 scheduled for a stable release?
          Hide
          Varun Thacker added a comment -

          Updated patch.

          • Adds a random test in SolrExampleTests
          • Adds missing support for child documents in XMLRepsonseParser

          javabin_backcompat_child_docs.bin remains the same and is attached separately.

          Show
          Varun Thacker added a comment - Updated patch. Adds a random test in SolrExampleTests Adds missing support for child documents in XMLRepsonseParser javabin_backcompat_child_docs.bin remains the same and is attached separately.
          Hide
          Varun Thacker added a comment -

          Raveendra Yerraguntl It won't be part of the 4.8.* releases.

          There is no particular timeline for the 4.9 release but it could be out within a month.

          Show
          Varun Thacker added a comment - Raveendra Yerraguntl It won't be part of the 4.8.* releases. There is no particular timeline for the 4.9 release but it could be out within a month.
          Hide
          Arcadius Ahouansou added a comment -

          Thanks Varun Thacker and all for the great work.
          Hoss Man Any chance this will get into 4.9?

          Thanks.

          Show
          Arcadius Ahouansou added a comment - Thanks Varun Thacker and all for the great work. Hoss Man Any chance this will get into 4.9? Thanks.
          Hide
          Hoss Man added a comment -

          Hey Varun,

          I didn't get very far digging into your patch, because i started by looking at your new randomized test in SolrExampleTests and encountered some problems...

          1) the first time i tried running your new randomized test, i got an NPE – it didn't reproduce reliable though, because your test called "new Random()" instead of leveraging the test-framework ("ant precommit" will warn you about stuff like this)

          2) Side note: there's no need to randomize which response parser is used when you add test methods to "SolrExampleTests" – every method there gets picked up automatically by the subclasses which ensure they are all run with every writer/parser.

          3) When started looking into fixing the use of random() in your test, I realized that the assertions in the test weren't very strong. What i was refering to in my earlier comment was having a test that attempted to use the transformer on a result set that included docs with children, and docs w/o children; and asserting that every child returned really was a decendent of the specified doc by comparing with what we know for a fact we indexed – your test wasn't really doing any of that.

          In the attached patch, i've overhauled SolrExampleTests.testChildDoctransformer() along the lines of what i was describing, but this has exposed a ClassCastException in the transformer. I haven't had a chance to dig into what's happening, but for some odd reason it only seems to manifest itself when the XML Response Writer is used...

          hossman@frisbee:~/lucene/dev/solr/solrj$ ant test -Dtests.method=testChildDoctransformer -Dtests.seed=720251997BEC4F70 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Pacific/Fiji -Dtests.file.encoding=UTF-8
          
          ...
          
             [junit4]   2> 11768 T20 C1 oasc.SolrException.log ERROR null:java.lang.ClassCastException: org.apache.lucene.document.Field cannot be cast to java.lang.String
             [junit4]   2> 		at org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformerFactory.java:142)
             [junit4]   2> 		at org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:254)
             [junit4]   2> 		at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:172)
             [junit4]   2> 		at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:111)
             [junit4]   2> 		at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:40)
             [junit4]   2> 		at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:760)
             [junit4]   2> 		at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:428)
             [junit4]   2> 		at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
             [junit4]   2> 		at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
             [junit4]   2> 		at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:136)
             [junit4]   2> 		at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
             [junit4]   2> 		at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
             [junit4]   2> 		at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
             [junit4]   2> 		at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
             [junit4]   2> 		at org.eclipse.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301)
             [junit4]   2> 		at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1077)
             [junit4]   2> 		at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
             [junit4]   2> 		at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
             [junit4]   2> 		at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
             [junit4]   2> 		at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
             [junit4]   2> 		at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
             [junit4]   2> 		at org.eclipse.jetty.server.Server.handle(Server.java:368)
             [junit4]   2> 		at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
             [junit4]   2> 		at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
             [junit4]   2> 		at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
             [junit4]   2> 		at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
             [junit4]   2> 		at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
             [junit4]   2> 		at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
             [junit4]   2> 		at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
             [junit4]   2> 		at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
             [junit4]   2> 		at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
             [junit4]   2> 		at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
             [junit4]   2> 		at java.lang.Thread.run(Thread.java:744)
             [junit4]   2> 	
             [junit4]   2> 11774 T12 oas.SolrTestCaseJ4.tearDown ###Ending testChildDoctransformer
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=SolrExampleStreamingTest -Dtests.method=testChildDoctransformer -Dtests.seed=720251997BEC4F70 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Pacific/Fiji -Dtests.file.encoding=UTF-8
             [junit4] ERROR   5.76s J2 | SolrExampleStreamingTest.testChildDoctransformer <<<
          
          ...
          
             [junit4]   2> 16262 T39 C1 oasc.SolrException.log ERROR null:java.lang.ClassCastException: org.apache.lucene.document.Field cannot be cast to java.lang.String
             [junit4]   2> 		at org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformerFactory.java:142)
             [junit4]   2> 		at org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:254)
             [junit4]   2> 		at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:172)
             [junit4]   2> 		at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:111)
             [junit4]   2> 		at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:40)
             [junit4]   2> 		at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:760)
             [junit4]   2> 		at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:428)
             [junit4]   2> 		at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
             [junit4]   2> 		at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
             [junit4]   2> 		at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:136)
             [junit4]   2> 		at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
             [junit4]   2> 		at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
             [junit4]   2> 		at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
             [junit4]   2> 		at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
             [junit4]   2> 		at org.eclipse.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301)
             [junit4]   2> 		at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1077)
             [junit4]   2> 		at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
             [junit4]   2> 		at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
             [junit4]   2> 		at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
             [junit4]   2> 		at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
             [junit4]   2> 		at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
             [junit4]   2> 		at org.eclipse.jetty.server.Server.handle(Server.java:368)
             [junit4]   2> 		at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
             [junit4]   2> 		at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
             [junit4]   2> 		at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
             [junit4]   2> 		at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
             [junit4]   2> 		at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
             [junit4]   2> 		at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
             [junit4]   2> 		at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
             [junit4]   2> 		at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
             [junit4]   2> 		at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
             [junit4]   2> 		at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
             [junit4]   2> 		at java.lang.Thread.run(Thread.java:744)
             [junit4]   2> 	
             [junit4]   2> 16264 T30 oas.SolrTestCaseJ4.tearDown ###Ending testChildDoctransformer
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=SolrExampleXMLTest -Dtests.method=testChildDoctransformer -Dtests.seed=720251997BEC4F70 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Pacific/Fiji -Dtests.file.encoding=UTF-8
             [junit4] ERROR   0.96s J1 | SolrExampleXMLTest.testChildDoctransformer <<<
          
          ...
          
             [junit4] Tests with failures:
             [junit4]   - org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest.testChildDoctransformer
             [junit4]   - org.apache.solr.client.solrj.SolrExampleXMLTest.testChildDoctransformer
          
          Show
          Hoss Man added a comment - Hey Varun, I didn't get very far digging into your patch, because i started by looking at your new randomized test in SolrExampleTests and encountered some problems... 1) the first time i tried running your new randomized test, i got an NPE – it didn't reproduce reliable though, because your test called "new Random()" instead of leveraging the test-framework ("ant precommit" will warn you about stuff like this) 2) Side note: there's no need to randomize which response parser is used when you add test methods to "SolrExampleTests" – every method there gets picked up automatically by the subclasses which ensure they are all run with every writer/parser. 3) When started looking into fixing the use of random() in your test, I realized that the assertions in the test weren't very strong. What i was refering to in my earlier comment was having a test that attempted to use the transformer on a result set that included docs with children, and docs w/o children; and asserting that every child returned really was a decendent of the specified doc by comparing with what we know for a fact we indexed – your test wasn't really doing any of that. In the attached patch, i've overhauled SolrExampleTests.testChildDoctransformer() along the lines of what i was describing, but this has exposed a ClassCastException in the transformer. I haven't had a chance to dig into what's happening, but for some odd reason it only seems to manifest itself when the XML Response Writer is used... hossman@frisbee:~/lucene/dev/solr/solrj$ ant test -Dtests.method=testChildDoctransformer -Dtests.seed=720251997BEC4F70 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Pacific/Fiji -Dtests.file.encoding=UTF-8 ... [junit4] 2> 11768 T20 C1 oasc.SolrException.log ERROR null:java.lang.ClassCastException: org.apache.lucene.document.Field cannot be cast to java.lang.String [junit4] 2> at org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformerFactory.java:142) [junit4] 2> at org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:254) [junit4] 2> at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:172) [junit4] 2> at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:111) [junit4] 2> at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:40) [junit4] 2> at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:760) [junit4] 2> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:428) [junit4] 2> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) [junit4] 2> at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:136) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) [junit4] 2> at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229) [junit4] 2> at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) [junit4] 2> at org.eclipse.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301) [junit4] 2> at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1077) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) [junit4] 2> at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) [junit4] 2> at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) [junit4] 2> at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) [junit4] 2> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) [junit4] 2> at org.eclipse.jetty.server.Server.handle(Server.java:368) [junit4] 2> at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) [junit4] 2> at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) [junit4] 2> at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) [junit4] 2> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) [junit4] 2> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) [junit4] 2> at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) [junit4] 2> at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628) [junit4] 2> at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) [junit4] 2> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) [junit4] 2> at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) [junit4] 2> at java.lang.Thread.run(Thread.java:744) [junit4] 2> [junit4] 2> 11774 T12 oas.SolrTestCaseJ4.tearDown ###Ending testChildDoctransformer [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=SolrExampleStreamingTest -Dtests.method=testChildDoctransformer -Dtests.seed=720251997BEC4F70 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Pacific/Fiji -Dtests.file.encoding=UTF-8 [junit4] ERROR 5.76s J2 | SolrExampleStreamingTest.testChildDoctransformer <<< ... [junit4] 2> 16262 T39 C1 oasc.SolrException.log ERROR null:java.lang.ClassCastException: org.apache.lucene.document.Field cannot be cast to java.lang.String [junit4] 2> at org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformerFactory.java:142) [junit4] 2> at org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:254) [junit4] 2> at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:172) [junit4] 2> at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:111) [junit4] 2> at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:40) [junit4] 2> at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:760) [junit4] 2> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:428) [junit4] 2> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) [junit4] 2> at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:136) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) [junit4] 2> at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229) [junit4] 2> at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) [junit4] 2> at org.eclipse.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301) [junit4] 2> at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1077) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) [junit4] 2> at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) [junit4] 2> at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) [junit4] 2> at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) [junit4] 2> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) [junit4] 2> at org.eclipse.jetty.server.Server.handle(Server.java:368) [junit4] 2> at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) [junit4] 2> at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) [junit4] 2> at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) [junit4] 2> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) [junit4] 2> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) [junit4] 2> at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) [junit4] 2> at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628) [junit4] 2> at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) [junit4] 2> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) [junit4] 2> at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) [junit4] 2> at java.lang.Thread.run(Thread.java:744) [junit4] 2> [junit4] 2> 16264 T30 oas.SolrTestCaseJ4.tearDown ###Ending testChildDoctransformer [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=SolrExampleXMLTest -Dtests.method=testChildDoctransformer -Dtests.seed=720251997BEC4F70 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Pacific/Fiji -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.96s J1 | SolrExampleXMLTest.testChildDoctransformer <<< ... [junit4] Tests with failures: [junit4] - org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest.testChildDoctransformer [junit4] - org.apache.solr.client.solrj.SolrExampleXMLTest.testChildDoctransformer
          Hide
          Varun Thacker added a comment -

          Fixed the class cast exception.

          This passes for me now -

           
          ant test -Dtests.method=testChildDoctransformer -Dtests.seed=720251997BEC4F70 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Pacific/Fiji -Dtests.file.encoding=UTF-8
          

          Also ran it over 20 times and it is passing.

          Show
          Varun Thacker added a comment - Fixed the class cast exception. This passes for me now - ant test -Dtests.method=testChildDoctransformer -Dtests.seed=720251997BEC4F70 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Pacific/Fiji -Dtests.file.encoding=UTF-8 Also ran it over 20 times and it is passing.
          Hide
          Hoss Man added a comment -

          Hey Varun,

          I'd started looking ~ChildDocTransformerFactory.java:142 before i saw your new patch – comparing the old code with the new code it still seems like this is more brittle than it needs to be (particularly in cases where the uniqueKey field type isn't a string – ie: a TrieIntField)

          I've attached an update that eliminates (most) of that brittle casting code to rely on the FieldType methods instead ... i still want to review the rest of the patch in more depth, but i wanted to go ahead and attach this update ASAP so you could take a look (and because i'm not sure how much more patch reviewing time i'll get in before i leave town tomorrow)

          Show
          Hoss Man added a comment - Hey Varun, I'd started looking ~ChildDocTransformerFactory.java:142 before i saw your new patch – comparing the old code with the new code it still seems like this is more brittle than it needs to be (particularly in cases where the uniqueKey field type isn't a string – ie: a TrieIntField) I've attached an update that eliminates (most) of that brittle casting code to rely on the FieldType methods instead ... i still want to review the rest of the patch in more depth, but i wanted to go ahead and attach this update ASAP so you could take a look (and because i'm not sure how much more patch reviewing time i'll get in before i leave town tomorrow)
          Hide
          Hoss Man added a comment -

          Why is the tag name in the JSON format "childDocs" but in the XML format it's "childDoc" (no plural) ? ... seems like those should be consistent.

          I guess because in JSON the input is a JSON array hence "childDocs", while in XML we use multiple "childDoc" tags to represent nested documents.

          That makes sense – but now has me thinking back to the proposed usage in your earliest comment on this issue: why create a new <childDoc> element in the XML at all? why not just re-use <doc> (nested inside the existing <doc>) ... that seems like the most straight forward solution, and from what i can tell, that would probably simplify the changes to XMLResponseParser.java as well wouldn't it?

          speaking of which – i don't understand the need for changing the method sig for XMLResponseParser.readDocument ... why can't the method construct the SolrDocument objects itself?

          Added a non mandatory parameter called "numChildDocs" which makes it configurable. Although I'm not sure if the name is correct.

          hmmm, yeah ... for consistency with the top level query we could use something like "rows" but the risk for confusion there seems like it outweighs the consistency factor.

          how about "limit" ?

          Added a non mandatory parameter called "childFilter" ...

          look good ... in general ChildDocTransformerFactory looks pretty good to me now – although I just noticed a typo in the SolrException msg if parentFilter is null ... it refers to "which" – but that doesn't apply here.

          2. Created a new binary file for backcompatibility and forwardcompatibility.

          I might be missing something, buti don't think testBackCompatForSolrDocumentWithChildDocs is actually asserting anything related to the child docs – because it uses assertSolrDocumentEquals, but that method hasn't been updated to know about child docs, has it?


          To sum up:

          • In general, i think the current patch looks great
          • remaining concerns about implementation:
            • testBackCompatForSolrDocumentWithChildDocs doesn't seem valid to me w/o changes to assertSolrDocumentEquals
            • err msg typo in ChildDocTransformerFactory needs fixed
            • method sig change in XMLResponseParser.readDocument seems unneccesasary
          • remaining questions about the API:
            • better name for numChildDocs ? ... how about limit ?
            • why use <childDoc> in XML instead of <doc> ?
          Show
          Hoss Man added a comment - Why is the tag name in the JSON format "childDocs" but in the XML format it's "childDoc" (no plural) ? ... seems like those should be consistent. I guess because in JSON the input is a JSON array hence "childDocs", while in XML we use multiple "childDoc" tags to represent nested documents. That makes sense – but now has me thinking back to the proposed usage in your earliest comment on this issue: why create a new <childDoc> element in the XML at all? why not just re-use <doc> (nested inside the existing <doc> ) ... that seems like the most straight forward solution, and from what i can tell, that would probably simplify the changes to XMLResponseParser.java as well wouldn't it? speaking of which – i don't understand the need for changing the method sig for XMLResponseParser.readDocument ... why can't the method construct the SolrDocument objects itself? Added a non mandatory parameter called "numChildDocs" which makes it configurable. Although I'm not sure if the name is correct. hmmm, yeah ... for consistency with the top level query we could use something like "rows" but the risk for confusion there seems like it outweighs the consistency factor. how about "limit" ? Added a non mandatory parameter called "childFilter" ... look good ... in general ChildDocTransformerFactory looks pretty good to me now – although I just noticed a typo in the SolrException msg if parentFilter is null ... it refers to "which" – but that doesn't apply here. 2. Created a new binary file for backcompatibility and forwardcompatibility. I might be missing something, buti don't think testBackCompatForSolrDocumentWithChildDocs is actually asserting anything related to the child docs – because it uses assertSolrDocumentEquals , but that method hasn't been updated to know about child docs, has it? To sum up: In general, i think the current patch looks great remaining concerns about implementation: testBackCompatForSolrDocumentWithChildDocs doesn't seem valid to me w/o changes to assertSolrDocumentEquals err msg typo in ChildDocTransformerFactory needs fixed method sig change in XMLResponseParser.readDocument seems unneccesasary remaining questions about the API: better name for numChildDocs ? ... how about limit ? why use <childDoc> in XML instead of <doc> ?
          Hide
          Varun Thacker added a comment - - edited

          testBackCompatForSolrDocumentWithChildDocs doesn't seem valid to me w/o changes to assertSolrDocumentEquals

          Fixed assertSolrDocumentEquals. I had overlooked it. Thanks

          err msg typo in ChildDocTransformerFactory needs fixed

          Fixed

          method sig change in XMLResponseParser.readDocument seems unneccesasary

          Fixed. I was experimenting with some other way but with the current implementation the method signature doesn't need to be changed.

          better name for numChildDocs ? ... how about limit ?

          +1 to limit. I never liked numChildDocs. Changed it to limit.

          why use <childDoc> in XML instead of <doc> ?

          Agreed. We should use <doc>. Even when a user inputs the nested documents in XML no special <childDoc> is used. Fixed XMLWriter and XMLResponseParser.

          Also I changed the JSON ouput array key from

          childDocs

          to

          _childDocuments_ 

          This keeps it consistent with the input JSON with child documents.

          Show
          Varun Thacker added a comment - - edited testBackCompatForSolrDocumentWithChildDocs doesn't seem valid to me w/o changes to assertSolrDocumentEquals Fixed assertSolrDocumentEquals. I had overlooked it. Thanks err msg typo in ChildDocTransformerFactory needs fixed Fixed method sig change in XMLResponseParser.readDocument seems unneccesasary Fixed. I was experimenting with some other way but with the current implementation the method signature doesn't need to be changed. better name for numChildDocs ? ... how about limit ? +1 to limit. I never liked numChildDocs. Changed it to limit. why use <childDoc> in XML instead of <doc> ? Agreed. We should use <doc>. Even when a user inputs the nested documents in XML no special <childDoc> is used. Fixed XMLWriter and XMLResponseParser. Also I changed the JSON ouput array key from childDocs to _childDocuments_ This keeps it consistent with the input JSON with child documents.
          Hide
          Arcadius Ahouansou added a comment -

          Hi Varun Thacker
          I am experimenting with this patch and I have a quick question:
          I have applied the patch https://issues.apache.org/jira/secure/attachment/12646085/SOLR-5285.patch to trunk.

          I have indexed the json file

          [
             {
                "id":"0",
                "content_type":"level-0",
                "_childDocuments_":[
                   {
                      "id":"1",
                      "content_type":"level-1",
                      "_childDocuments_":[
                         {
                            "id":"2",
                            "content_type":"level-2",
                            "_childDocuments_":[
                               {
                                  "id":"3",
                                  "content_type":"level-3"
                               }
                            ]
                         }
                      ]
                   }
                ]
             }
          ]
          

          when I query through
          http://localhost:8983/solr/collection1/select?q=*:*&wt=json&indent=true&fl=*,[child%20parentFilter=%22content_type:level-0%22]&fq=content_type:level-0

          I get back

          {
             "responseHeader":{
                "status":0,
                "QTime":1,
                "params":{
                   "fl":"*,[child parentFilter=\"content_type:level-0\"]",
                   "indent":"true",
                   "q":"*:*",
                   "wt":"json",
                   "fq":"content_type:level-0"
                }
             },
             "response":{
                "numFound":1,
                "start":0,
                "docs":[
                   {
                      "id":"0",
                      "content_type":[
                         "level-0"
                      ],
                      "_version_":1468783944507850752,
                      "childDocs":[
                         {
                            "id":"3",
                            "content_type":[
                               "level-3"
                            ]
                         },
                         {
                            "id":"2",
                            "content_type":[
                               "level-2"
                            ]
                         },
                         {
                            "id":"1",
                            "content_type":[
                               "level-1"
                            ]
                         }
                      ]
                   }
                ]
             }
          }
          

          Is this the expected output or a bug or am I doing something wrong?
          I would have thought that the hierarchy would be preserved.

          Thanks.

          Show
          Arcadius Ahouansou added a comment - Hi Varun Thacker I am experimenting with this patch and I have a quick question: I have applied the patch https://issues.apache.org/jira/secure/attachment/12646085/SOLR-5285.patch to trunk. I have indexed the json file [ { "id" : "0" , "content_type" : "level-0" , "_childDocuments_" :[ { "id" : "1" , "content_type" : "level-1" , "_childDocuments_" :[ { "id" : "2" , "content_type" : "level-2" , "_childDocuments_" :[ { "id" : "3" , "content_type" : "level-3" } ] } ] } ] } ] when I query through http://localhost:8983/solr/collection1/select?q=*:*&wt=json&indent=true&fl=*,[child%20parentFilter=%22content_type:level-0%22]&fq=content_type:level-0 I get back { "responseHeader" :{ "status" :0, "QTime" :1, "params" :{ "fl" : "*,[child parentFilter=\" content_type:level-0\ "]" , "indent" : " true " , "q" : "*:*" , "wt" : "json" , "fq" : "content_type:level-0" } }, "response" :{ "numFound" :1, "start" :0, "docs" :[ { "id" : "0" , "content_type" :[ "level-0" ], "_version_" :1468783944507850752, "childDocs" :[ { "id" : "3" , "content_type" :[ "level-3" ] }, { "id" : "2" , "content_type" :[ "level-2" ] }, { "id" : "1" , "content_type" :[ "level-1" ] } ] } ] } } Is this the expected output or a bug or am I doing something wrong? I would have thought that the hierarchy would be preserved. Thanks.
          Hide
          Varun Thacker added a comment -

          Hi Arcadius Ahouansou,

          Thanks for trying out the patch.

          Yes this is expected behaviour. Although one can index an entire hierarchy the final result will be a flat list. This has been discussed in the comments on the issue.

          The reason for this is in lucene stored the parent document and all its child document in one 'block' and uses this property of storing to do fast joins. Here is a good blog post which explains the mechanism better - http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html

          Were you just testing or is there a particular use case which needs this hierarchy to be maintained.

          Show
          Varun Thacker added a comment - Hi Arcadius Ahouansou , Thanks for trying out the patch. Yes this is expected behaviour. Although one can index an entire hierarchy the final result will be a flat list. This has been discussed in the comments on the issue. The reason for this is in lucene stored the parent document and all its child document in one 'block' and uses this property of storing to do fast joins. Here is a good blog post which explains the mechanism better - http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html Were you just testing or is there a particular use case which needs this hierarchy to be maintained.
          Hide
          Varun Thacker added a comment -

          Correct patch with all the changes.

          Show
          Varun Thacker added a comment - Correct patch with all the changes.
          Hide
          Arcadius Ahouansou added a comment - - edited

          Hi Varun Thacker
          Yes, in my use case, we need to preserve the hierarchy.

          I believe it is a good idea to preserve the structure of the hierarchy in the response, or at least, provide the option to preserve it.

          It makes more sense IMHO.

          Is it an expensive operation?

          Show
          Arcadius Ahouansou added a comment - - edited Hi Varun Thacker Yes, in my use case, we need to preserve the hierarchy. I believe it is a good idea to preserve the structure of the hierarchy in the response, or at least, provide the option to preserve it. It makes more sense IMHO. Is it an expensive operation?
          Hide
          Varun Thacker added a comment -

          Hi Arcadius Ahouansou,

          In Lucene the documents are stored in one 'block' which consists of the parent document and the child documents. It's just 1 level and thus the hierarchy will not get preserved.

          You could create a Lucene Jira for it and patches are always welcome

          Show
          Varun Thacker added a comment - Hi Arcadius Ahouansou , In Lucene the documents are stored in one 'block' which consists of the parent document and the child documents. It's just 1 level and thus the hierarchy will not get preserved. You could create a Lucene Jira for it and patches are always welcome
          Hide
          Arcadius Ahouansou added a comment -

          Hi Varun Thacker,
          I tried

          facet=true&facet.field=content_type


          The facet count for children was always 0.
          Is this a feature?
          Thanks.

          Show
          Arcadius Ahouansou added a comment - Hi Varun Thacker , I tried facet= true &facet.field=content_type The facet count for children was always 0. Is this a feature? Thanks.
          Hide
          Mikhail Khludnev added a comment -

          Arcadius Ahouansou it's SOLR-5743 . Not much progress so far. I'm expecting some movement during this year.

          Show
          Mikhail Khludnev added a comment - Arcadius Ahouansou it's SOLR-5743 . Not much progress so far. I'm expecting some movement during this year.
          Hide
          Hoss Man added a comment -

          Correct patch with all the changes.

          This looks solid to me ... running tests now

          Show
          Hoss Man added a comment - Correct patch with all the changes. This looks solid to me ... running tests now
          Hide
          ASF subversion and git services added a comment -

          Commit 1601028 from hossman@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1601028 ]

          SOLR-5285: Added a new [child ...] DocTransformer for optionally including Block-Join decendent documents inline in the results of a search

          Show
          ASF subversion and git services added a comment - Commit 1601028 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1601028 ] SOLR-5285 : Added a new [child ...] DocTransformer for optionally including Block-Join decendent documents inline in the results of a search
          Hide
          Hoss Man added a comment -

          I've backported to 4x and am running the tests now – but while backporting i noticed something i overlooked when reviewing the patch: the ChildDocTransformerFactory usage of FieldType.toExternal & FieldType.getFieldQuery from my May 21th patch some how got lost along the way, so this is still brittle in how it deals with the primaryKey field.

          once i've finished backporting the current trunk state to 4x, i'll try to fix that before resolving this issue.

          Show
          Hoss Man added a comment - I've backported to 4x and am running the tests now – but while backporting i noticed something i overlooked when reviewing the patch: the ChildDocTransformerFactory usage of FieldType.toExternal & FieldType.getFieldQuery from my May 21th patch some how got lost along the way, so this is still brittle in how it deals with the primaryKey field. once i've finished backporting the current trunk state to 4x, i'll try to fix that before resolving this issue.
          Hide
          ASF subversion and git services added a comment -

          Commit 1601037 from hossman@apache.org in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1601037 ]

          SOLR-5285: Added a new [child ...] DocTransformer for optionally including Block-Join decendent documents inline in the results of a search (merge r1601028)

          Show
          ASF subversion and git services added a comment - Commit 1601037 from hossman@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1601037 ] SOLR-5285 : Added a new [child ...] DocTransformer for optionally including Block-Join decendent documents inline in the results of a search (merge r1601028)
          Hide
          ASF subversion and git services added a comment -

          Commit 1601044 from hossman@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1601044 ]

          SOLR-5285: use FieldType methods to be less brittle

          Show
          ASF subversion and git services added a comment - Commit 1601044 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1601044 ] SOLR-5285 : use FieldType methods to be less brittle
          Hide
          ASF subversion and git services added a comment -

          Commit 1601052 from hossman@apache.org in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1601052 ]

          SOLR-5285: use FieldType methods to be less brittle (merge r1601044)

          Show
          ASF subversion and git services added a comment - Commit 1601052 from hossman@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1601052 ] SOLR-5285 : use FieldType methods to be less brittle (merge r1601044)
          Hide
          Varun Thacker added a comment -

          Documentation we should add to the ref guide under - https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters

          [child]

          DocTransformer for optionally including Block-Join decendent documents inline in the results of a search. This transformer returns all descendants of each parent document in a flat list nested inside the parent document. This is useful when you have indexed nested child documents and want to retrieve the child documents for the relavant parent documents for any type of search query. Supported response formats are - xml, json, and javabin

          Usage example -

          [child parentFilter="fieldName:fieldValue"]
          
          [child parentFilter="fieldName:fieldValue" childFilter="fieldName:fieldValue"]
          
          [child parentFilter="fieldName:fieldValue" childFilter="fieldName:fieldValue" limit=20]
          

          Mandatory -

          • The "parentFilter" parameter is mandatory. This condition should identify only child documents.

          Optional -

          • The "childFilter" parameter is used to filter out which child documents should be returned.
          • The "limit" parameter which provides an option to specify the number of child documents to be returned per parent document. By default it's set to 10.
          Show
          Varun Thacker added a comment - Documentation we should add to the ref guide under - https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters [child] DocTransformer for optionally including Block-Join decendent documents inline in the results of a search. This transformer returns all descendants of each parent document in a flat list nested inside the parent document. This is useful when you have indexed nested child documents and want to retrieve the child documents for the relavant parent documents for any type of search query. Supported response formats are - xml, json, and javabin Usage example - [child parentFilter= "fieldName:fieldValue" ] [child parentFilter= "fieldName:fieldValue" childFilter= "fieldName:fieldValue" ] [child parentFilter= "fieldName:fieldValue" childFilter= "fieldName:fieldValue" limit=20] Mandatory - The "parentFilter" parameter is mandatory. This condition should identify only child documents. Optional - The "childFilter" parameter is used to filter out which child documents should be returned. The "limit" parameter which provides an option to specify the number of child documents to be returned per parent document. By default it's set to 10.
          Hide
          Hoss Man added a comment -

          I documented on the newly created page for doc transformers...

          https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents

          Show
          Hoss Man added a comment - I documented on the newly created page for doc transformers... https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents

            People

            • Assignee:
              Hoss Man
              Reporter:
              Varun Thacker
            • Votes:
              11 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development