Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-14470

Add streaming expressions to /export handler

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      Many streaming scenarios would greatly benefit from the ability to perform partial rollups (or other transformations) as early as possible, in order to minimize the amount of data that has to be sent from shards to the aggregating node.

      This can be implemented as a subset of streaming expressions that process the data directly inside each local ExportHandler and outputs only the records from the resulting stream.

      Conceptually it would be similar to the way Hadoop Combiner works. As is the case with Combiner, because the input data is processed in batches there would be no guarantee that only 1 record per unique sort values would be emitted - in fact, in most cases multiple partial aggregations would be emitted. Still, in many scenarios this would allow reducing the amount of data to be sent by several orders of magnitude.

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              ab Andrzej Bialecki
              Reporter:
              ab Andrzej Bialecki

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1.5h
                1.5h

                  Issue deployment