Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-14537

Improve performance of ExportWriter

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 8.7
    • Export Writer
    • None

    Description

      Retrieving, sorting and writing out documents in ExportWriter are three aspects of the /export handler that can be further optimized.

      SOLR-14470 introduced some level of caching in StringValue. Further options for caching and speedups should be explored.

      Currently the sort/retrieve and write operations are done sequentially, but they could be parallelized, considering that they block on different channels - the first is index reading & CPU bound, the other is bound by the receiving end because it uses blocking IO. The sorting and retrieving of values could be done in parallel with the operation of writing out the current batch of results.

      One possible approach here would be to use "double buffering" where one buffered batch that is ready (already sorted and retrieved) is being written out, while the other batch is being prepared in a background thread, and when both are done the buffers are swapped. This wouldn't complicate the current code too much but it should instantly give up to 2x higher throughput.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ab Andrzej Bialecki
            ab Andrzej Bialecki
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 0.5h
                0.5h

                Slack

                  Issue deployment