Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10991

Support removing top N influential observations in the regress Stream Evaluator

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 7.0
    • Component/s: streaming expressions
    • Labels:
      None

      Description

      Influential observations are outliers that have a large effect on the slope of a regression line. It is very useful to be able to automatically remove influential observations prior to running a simple regression.

      Syntax:

      regress(colA, colB, 10)
      

      The function above regresses colA and colB after removing the top 10 influential observations from the data set.

      The approach taken will be to remove each observation one and at a time and re-run the regression on the data set minus the observation. After each run the difference in model fit will be recorded. After completing the regression runs, N observations that had the highest difference of fit will be removed from the data set. The final regression will be run without those observations.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jbernste Joel Bernstein
                Reporter:
                jbernste Joel Bernstein
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: