Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10304

Refactor Document/Stored-field handling out of SolrIndexSearcher

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.6
    • Component/s: None
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      SolrIndexSearcher is nearly 3 thousand lines of code. A sizable part of it pertains to Document handling, including various stored-field concerns and docValue substitutions (docValueAsStored related). There are already comments marking the start and end of this part of SolrIndexSearcher, plus there some fields and their initialization that are only in support of those methods. I propose that all of this go to a new companion class SolrDocumentFetcher. SolrIndexSearcher can add a getter for it, and where applicable existing callers can call to this instead. "Override"'s will need to stay of course.
      ( Originally proposed in SOLR-10286 )

        Activity

        Hide
        dsmiley David Smiley added a comment -

        Here's a patch.
        Misc notes:

        • SolrIndexSearcher now has getters for the SolrDocumentFetcher, for FieldInfos, and for the LeafContexts. The latter 2 were needed by SDF.
        • Marked SolrPluginUtils.docListToSolrDocumentList deprecated. It's only called by ClusteringComponent, and I think the "ids" param aspect is a bit ugly and not worth supporting. If someone wants a similar method they can speak up and we can add a method to SolrDocumentFetcher.
        • Removed convenience method readDocs and it's overloaded version. Nobody was calling them. I looked around to see if, IMO, somewhere we should be calling them. I think not as the method promotes holding all the given Lucene docs in memory at once rather than letting each caller grab the parts of them they want, potentially in a more streaming way.
        • class DocsStreamer:
          • The "dvFieldsToReturn" calculation was moved to a static method on this class. I considered moving it to SolrDocumentFetcher but opted not to for now; it's debatable. ReturnFields might be the ideal target. Annoyingly ReturnFields is an interface with just one implementation so I didn't do this; not to mention it might be a bit out of scope.
          • Renamed static DocsStreamer.getDoc(doc) to convertLuceneDocToSolrDoc. I considered moving it to SolrDocumentFetcher but it's not 100% clear it should. Maybe? If so maybe it could then be made non-static and joined with calling decorateDocValueFields; it's debatable.
        • there are several methods in RealtimeGetComponent affected... it suggests to me future refactorings might merge/move some of this logic, perhaps with DocsStreamer. I dunno.
        Show
        dsmiley David Smiley added a comment - Here's a patch. Misc notes: SolrIndexSearcher now has getters for the SolrDocumentFetcher, for FieldInfos, and for the LeafContexts. The latter 2 were needed by SDF. Marked SolrPluginUtils.docListToSolrDocumentList deprecated. It's only called by ClusteringComponent, and I think the "ids" param aspect is a bit ugly and not worth supporting. If someone wants a similar method they can speak up and we can add a method to SolrDocumentFetcher. Removed convenience method readDocs and it's overloaded version. Nobody was calling them. I looked around to see if, IMO, somewhere we should be calling them. I think not as the method promotes holding all the given Lucene docs in memory at once rather than letting each caller grab the parts of them they want, potentially in a more streaming way. class DocsStreamer: The "dvFieldsToReturn" calculation was moved to a static method on this class. I considered moving it to SolrDocumentFetcher but opted not to for now; it's debatable. ReturnFields might be the ideal target. Annoyingly ReturnFields is an interface with just one implementation so I didn't do this; not to mention it might be a bit out of scope. Renamed static DocsStreamer.getDoc(doc) to convertLuceneDocToSolrDoc . I considered moving it to SolrDocumentFetcher but it's not 100% clear it should. Maybe? If so maybe it could then be made non-static and joined with calling decorateDocValueFields; it's debatable. there are several methods in RealtimeGetComponent affected... it suggests to me future refactorings might merge/move some of this logic, perhaps with DocsStreamer. I dunno.
        Hide
        ichattopadhyaya Ishan Chattopadhyaya added a comment -

        +1 to the refactoring! The SolrIndexSearcher looks much cleaner now.

        Renamed static DocsStreamer.getDoc(doc) to convertLuceneDocToSolrDoc. I considered moving it to SolrDocumentFetcher but it's not 100% clear it should. Maybe?

        There's also RTGC.toSolrDoc(), RTGC.toSolrInputDocument() that are very similar to DocsStreamer.getDoc(). There maybe one or two more of these methods. I think those should all be co-located in some utility class.

        Show
        ichattopadhyaya Ishan Chattopadhyaya added a comment - +1 to the refactoring! The SolrIndexSearcher looks much cleaner now. Renamed static DocsStreamer.getDoc(doc) to convertLuceneDocToSolrDoc. I considered moving it to SolrDocumentFetcher but it's not 100% clear it should. Maybe? There's also RTGC.toSolrDoc(), RTGC.toSolrInputDocument() that are very similar to DocsStreamer.getDoc(). There maybe one or two more of these methods. I think those should all be co-located in some utility class.
        Hide
        dsmiley David Smiley added a comment -

        I'm glad you like it Ishan

        I have no convictions about where to put these things... but one option is SolrDocumentFetcher. Fetching & conversion / decoration could be its scope. If it's not SDF, then perhaps SDF's scope should be more narrow as to not include decorateDocValueFields. I'm kind of inclined to put all this stuff in SDF.

        Show
        dsmiley David Smiley added a comment - I'm glad you like it Ishan I have no convictions about where to put these things... but one option is SolrDocumentFetcher . Fetching & conversion / decoration could be its scope. If it's not SDF, then perhaps SDF's scope should be more narrow as to not include decorateDocValueFields . I'm kind of inclined to put all this stuff in SDF.
        Hide
        dsmiley David Smiley added a comment -

        Update patch with small changes pertaining to the move of SolrPluginUtils.docListToSolrDocumentList to the ClusteringComponent. The test needed to move to which in turn required some additions to the CC's schema.xml.

        All tests pass and ant precommit. I'll commit this later today.

        Further consolidation/moving around relative to RealTimeGetComponent can happen in another issue.

        Show
        dsmiley David Smiley added a comment - Update patch with small changes pertaining to the move of SolrPluginUtils.docListToSolrDocumentList to the ClusteringComponent. The test needed to move to which in turn required some additions to the CC's schema.xml. All tests pass and ant precommit. I'll commit this later today. Further consolidation/moving around relative to RealTimeGetComponent can happen in another issue.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit f1aef3d12be1300a93a57570e576d94c59ac969e in lucene-solr's branch refs/heads/master from David Smiley
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f1aef3d ]

        SOLR-10304: Refactor new SolrDocumentFetcher out of SolrIndexSearcher

        Show
        jira-bot ASF subversion and git services added a comment - Commit f1aef3d12be1300a93a57570e576d94c59ac969e in lucene-solr's branch refs/heads/master from David Smiley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f1aef3d ] SOLR-10304 : Refactor new SolrDocumentFetcher out of SolrIndexSearcher
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 9c423a61a4ab5d88103cb26c190864454c5c1cb0 in lucene-solr's branch refs/heads/branch_6x from David Smiley
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9c423a6 ]

        SOLR-10304: Refactor new SolrDocumentFetcher out of SolrIndexSearcher

        (cherry picked from commit f1aef3d)

        Show
        jira-bot ASF subversion and git services added a comment - Commit 9c423a61a4ab5d88103cb26c190864454c5c1cb0 in lucene-solr's branch refs/heads/branch_6x from David Smiley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9c423a6 ] SOLR-10304 : Refactor new SolrDocumentFetcher out of SolrIndexSearcher (cherry picked from commit f1aef3d)

          People

          • Assignee:
            dsmiley David Smiley
            Reporter:
            dsmiley David Smiley
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development