Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-6559

Create an endpoint /update/xml/docs endpoint to do custom xml indexing

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Just the way we have an json end point create an xml end point too. use the XPathRecordReader in DIH to do the same . The syntax would require slight tweaking to match the params of /update/json/docs

      1. SOLR-6559.patch
        75 kB
        Anurag Sharma
      2. SOLR-6559.patch
        46 kB
        Anurag Sharma
      3. SOLR-6559.patch
        69 kB
        Anurag Sharma
      4. SOLR-6559.patch
        5 kB
        Anurag Sharma

        Issue Links

          Activity

          Hide
          anuragsharma Anurag Sharma added a comment -

          Attached unit test demonstrates flattening capabilities of XPathRecordReader.
          For /update/xml/docs endpoint should we keep the XPath syntax and also support /update/json/docs format for indexing?

          Show
          anuragsharma Anurag Sharma added a comment - Attached unit test demonstrates flattening capabilities of XPathRecordReader. For /update/xml/docs endpoint should we keep the XPath syntax and also support /update/json/docs format for indexing?
          Hide
          noble.paul Noble Paul added a comment -

          It is OK to stick to XPath syntax . Because we don't want to reeducate the users.

          What is your preference?

          Show
          noble.paul Noble Paul added a comment - It is OK to stick to XPath syntax . Because we don't want to reeducate the users. What is your preference?
          Hide
          anuragsharma Anurag Sharma added a comment -

          I also prefer sticking to the XPath format.

          Attaching the patch covering the basic functionality. I'll update with more patches covering the other use cases as supported in json.

          Show
          anuragsharma Anurag Sharma added a comment - I also prefer sticking to the XPath format. Attaching the patch covering the basic functionality. I'll update with more patches covering the other use cases as supported in json.
          Hide
          noble.paul Noble Paul added a comment -

          How far are you progressed ? Let me know if it is ready for review..

          Show
          noble.paul Noble Paul added a comment - How far are you progressed ? Let me know if it is ready for review..
          Hide
          anuragsharma Anurag Sharma added a comment -

          Attaching the patch file.

          Struggling to make srcField work.

          srcField functionality is not present in this patch. The unit test for this functionality is 'testXMLDocFormatWithSplitWithSrcField'. Facing issue to get the raw xml from XMLStreamReader as it doesn't buffer the data. It would be great if someone can suggest a quick tip.

          The entry point for the xml doc functionality is /update/xml/docs. It is implicitly registered and no need of request handler. Following parameters are implemented and unit test are added for them
          split=Solr based XPath splitter
          f=<field from schema xml>:<xpath splitter>

          More or less functionality is similar to /update/json/docs.

          Show
          anuragsharma Anurag Sharma added a comment - Attaching the patch file. Struggling to make srcField work. srcField functionality is not present in this patch. The unit test for this functionality is 'testXMLDocFormatWithSplitWithSrcField'. Facing issue to get the raw xml from XMLStreamReader as it doesn't buffer the data. It would be great if someone can suggest a quick tip. The entry point for the xml doc functionality is /update/xml/docs. It is implicitly registered and no need of request handler. Following parameters are implemented and unit test are added for them split=Solr based XPath splitter f=<field from schema xml>:<xpath splitter> More or less functionality is similar to /update/json/docs.
          Hide
          noble.paul Noble Paul added a comment -

          srcField is NOT required

          Show
          noble.paul Noble Paul added a comment - srcField is NOT required
          Hide
          anuragsharma Anurag Sharma added a comment - - edited

          Can you review the patch for merge if srcField is not required, Also, would like to know why srcField is not required, is there another api to store raw data?

          Show
          anuragsharma Anurag Sharma added a comment - - edited Can you review the patch for merge if srcField is not required, Also, would like to know why srcField is not required, is there another api to store raw data?
          Hide
          noble.paul Noble Paul added a comment - - edited

          The patch does not apply properly. So , so could not review it properly.

          Does it support wild cards yet? I don't see any tests yet

          The default cases should work fine . the default will have split=$ROOT&f=/**

          Show
          noble.paul Noble Paul added a comment - - edited The patch does not apply properly. So , so could not review it properly. Does it support wild cards yet? I don't see any tests yet The default cases should work fine . the default will have split=$ROOT&f=/**
          Hide
          anuragsharma Anurag Sharma added a comment -

          Attaching patch that can be applied on latest trunk.
          The XPathRecordReader doesn't support wild card. Either we have to implement the wildcard functionality or use another XPath parser.
          Also added a unit test (testSupportedWildCard) demonstrating the capability is unsupported. Also the patch has positive unit tests which are working.

          Show
          anuragsharma Anurag Sharma added a comment - Attaching patch that can be applied on latest trunk. The XPathRecordReader doesn't support wild card. Either we have to implement the wildcard functionality or use another XPath parser. Also added a unit test (testSupportedWildCard) demonstrating the capability is unsupported. Also the patch has positive unit tests which are working.
          Hide
          noble.paul Noble Paul added a comment -

          The XPathRecordReader doesn't support wild card

          it does . Look at the tests

          Show
          noble.paul Noble Paul added a comment - The XPathRecordReader doesn't support wild card it does . Look at the tests
          Hide
          anuragsharma Anurag Sharma added a comment -

          Looked for wildcard '*' couldn't find any unit test in TestXPathRecordReader

          Show
          anuragsharma Anurag Sharma added a comment - Looked for wildcard '*' couldn't find any unit test in TestXPathRecordReader
          Hide
          noble.paul Noble Paul added a comment - - edited
          • testAny_decendent_from_root
          • testAny_decendent_of_a_child2
          • testMixedContentFlattened

          These are all wild cards

          Show
          noble.paul Noble Paul added a comment - - edited testAny_decendent_from_root testAny_decendent_of_a_child2 testMixedContentFlattened These are all wild cards

            People

            • Assignee:
              noble.paul Noble Paul
              Reporter:
              noble.paul Noble Paul
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development