Details

Bug

Status: Open

Major

Resolution: Unresolved

None
Description
This is a summery of a mail sent to Stratos dev under "[Autoscaling] [Improvement] Introducing "curve fitting" for stat prediction algorithm of Autoscaler" subject.
Current implementation
Currently CEP calculates average, gradient, and second derivative and send those values to Autoscaler. Then Autoscaler predicts the values using S = u*t + 0.5*a*t*t.
In this method CEP calculation is not very much accurate as it does not consider all the events when calculating the gradient and second derivative. Therefore the equation we apply doesn't yield the best prediction.
Proposed Implementation
CEP's task
I think best approach is to do "curve fitting"[1] for received event sample in a particular time window. Refer "Locally weighted linear regression" section at [2] for more details.
We would need a second degree polynomial fitter for this, where we can use Apache commons math library for this. Refer the sample at [3], we can run this with any degree. e.g. 2, 3. Just increase the degree to increase the accuracy.
E.g.
So if get degree 2 polynomial fitter, we will have an equation like below where value(v) is our statistic value and time(t) is the time of event.
Equation we get from received events,
v = a*t*t + b*t + c
So the solution is,
Find memberwise curves that fits events received in specific window(say 10 minutes) at CEP
Send the parameters of fitted line(a, b, and c in above equation) with the timestamp of last event(T) in the window, to Autoscaler
Autoscaler's task
Autoscaler use v = a*t*t + b*t + c function to predict the value in any timestamp from the last timestamp
E.g. Say we need to find the value(v) after 1 minute(assuming we carried all the calculations in milliseconds),
v = a * (T+60000) * (T+60000) + b * (T+60000) + c
So we have memberwise predictions and we can find clusterwise prediction by averaging all the memberwise values.