Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9384

Add randomization to the train Streaming Expression to support very large training sets

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • streaming expressions
    • None

    Description

      The train (SOLR-9252) Streaming Expression optimizes a logistic regression model on text.

      The initial implementation instantiates a doc vector for each document in the training set on each iteration. The doc vectors are held in memory so, the size of the training set is limited by memory constraints.

      This ticket will add randomization to the algorithm so that a random set of documents from the training set are processed on each iteration.

      This will allow the train Streaming Expression to be run on much larger training sets.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jbernste Joel Bernstein
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: