Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-380

There's no way to convert search results into page-level hits of a "structured document".

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • 4.9, 6.0
    • search
    • None

    Description

      "Paged-Text" FieldType for Solr

      A chance to dig into the guts of Solr. The problem: If we index a monograph in Solr, there's no way to convert search results into page-level hits. The solution: have a "paged-text" fieldtype which keeps track of page divisions as it indexes, and reports page-level hits in the search results.

      The input would contain page milestones: <page id="234"/>. As Solr processed the tokens (using its standard tokenizers and filters), it would concurrently build a structural map of the item, indicating which term position marked the beginning of which page: <page id="234" firstterm="14324"/>. This map would be stored in an unindexed field in some efficient format.

      At search time, Solr would retrieve term positions for all hits that are returned in the current request, and use the stored map to determine page ids for each term position. The results would imitate the results for highlighting, something like:

      <lst name="pages">
        <lst name="doc1">
           <int name="pageid">234</int>
           <int name="pageid">236</int>
         </lst>
         <lst name="doc2">
           <int name="pageid">19</int>
         </lst>
      </lst>
      <lst name="hitpos">
         <lst name="doc1">
           <lst name="234">
             <int name="pos">14325</int>
           </lst>
         </lst>
         ...
      </lst>

      Attachments

        1. SOLR-380-XmlPayload.patch
          92 kB
          Tricia Jenkins
        2. SOLR-380-XmlPayload.patch
          155 kB
          Tricia Jenkins
        3. xmlpayload.jar
          10 kB
          Tricia Jenkins
        4. xmlpayload-example.zip
          8.55 MB
          Tricia Jenkins
        5. xmlpayload-src.jar
          5.74 MB
          Tricia Jenkins

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            pgwillia Tricia Jenkins
            Votes:
            4 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment