Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9651

Consider tracking modification time of external file fields for faster reloading

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.10.4
    • None
    • None
    • None
    • Linux

    Description

      I have an index of about 4M legal documents that has pagerank boosting configured as an external file field. The external file is about 100MB in size and has one row per document in the index. Each row indicates the pagerank score of a document. When we open new searchers, this file has to get reloaded, and it creates a noticeable delay for our users – takes several seconds to reload.

      An idea to fix this came up in a recent discussion in the Solr mailing list: Could the file only be reloaded if it has changed on disk? In other words, when new searchers are opened, could they check the modtime of the file, and avoid reloading it if the file hasn't changed?

      In our configuration, this would be a big improvement. We only change the pagerank file once/week because computing it is intensive and new documents don't tend to have a big impact. At the same time, because we're regularly adding new documents, we do hundreds of commits per day, all of which have a delay as the (largish) external file field is reloaded.

      Is this a reasonable improvement to request?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mlissner Mike Lissner
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: