Uploaded image for project: 'Stratos'
  1. Stratos
  2. STRATOS-939

CEP sends very large values for gradient and second derivative of load average

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 4.1.0 M3
    • Fix Version/s: 4.1.0 RC3
    • Component/s: CEP
    • Labels:
      None

      Description

      How we calculate gradient of two events?

      Say the events are; e1(t1,v1) and e2(t2,v2)

      tx - time in milliseconds when xth event occurred
      vx - value (memory, cpu etc.) that xth event carries

      time gap = t(2-1) = t2 -t1 milliseconds
      time gap in seconds = t(2-1) = (t2 - t1)/1000 seconds

      Hence,
      Gradient = (v2 - v1) / t(2-1) = ( (v2 - v1) * 1000 ) / (t2 - t1)

      I've enabled debug logs for CEP extension;
      log4j.logger.org.apache.stratos.cep.extension=DEBUG

      Please find the following 3 logs extracted from the debug logs;

      ===================================================================
      TID: [0] [STRATOS] [2014-11-05 19:47:27,073] DEBUG

      {org.apache.stratos.cep.extension.SecondDerivativeFinderWindowProcessor} - Gradient: -0.1996007984031936 Last val: 9.0 First val: 12.0 Time Gap: 15030 t1: 1415213202095 t2: 1415213217125 hash: 155426542

      TID: [0] [STRATOS] [2014-11-05 19:47:27,073] DEBUG {org.apache.stratos.cep.extension.SecondDerivativeFinderWindowProcessor}

      - Gradient: -999.9999999999998 Last val: 7.000000000000001 First val: 12.0 Time Gap: 5 t1: 1415213232152 t2: 1415213232157 hash: 155426542

      TID: [0] [STRATOS] [2014-11-05 19:47:27,074] DEBUG

      {org.apache.stratos.cep.extension.SecondDerivativeFinderWindowProcessor}

      - Gradient: -44.34884666437174 Last val: -999.9999999999998 First val: -0.1996007984031936 Time Gap: 22544 t1: 1415213209610 t2: 1415213232154 hash: 155426542
      ===================================================================

      So, as you can see the reason behind a large value is when the time gap between two subjected events is less than 1 second. This could happen since events are coming from different asynchronous agents and also when there are less number of events.

      FIX
      ====

      So, the fix I propose is a very simple one and it will not compromise anything AFAIS.

      Fix is to calculate time gap as follows;

      time gap = t(2-1) } t2 -t1 > 1000 -----> t2 - t1
      t2 - t1 <= 1000 ----> 1000

      I have tested this and works fine.

      1. STRATOS-939.diff
        3 kB
        Nirmal Fernando

        Activity

        Hide
        nirmal Nirmal Fernando added a comment -

        I've committed the fix in 35cd74a5c4c932da102ffb353010c2d7bdba7ea9.

        Also attached the patch.

        Show
        nirmal Nirmal Fernando added a comment - I've committed the fix in 35cd74a5c4c932da102ffb353010c2d7bdba7ea9. Also attached the patch.
        Hide
        imesh Imesh Gunaratne added a comment -

        Hi Nirmal,

        I do not think the milliseconds to seconds convertion is correct here.

        As I see we are taking the difference of two timestamp values and then dividing it by 1000. The corect way might be to first divide each value by 1000 and then take the difference.

        On the other hand we might not need to convert these values to seconds since we are taking a time difference and calculating a gradient.

        I did a quick test with the following sample:
        Gradient: -999.9999999999998 Last val: 7.000000000000001 First val: 12.0 Time Gap: 5 t1: 1415213232152 t2: 1415213232157

        According previous code:
        long tGap = t2 - t1;
        double gradient = 0.0;
        if (tGap > 0)

        { gradient = ((lastVal - firstVal) * 1000) / tGap; }

        t1: 1415213232152 t2: 1415213232157 firstVal: 12 lastVall: 7
        gradient: -1000.0

        According to your fix:
        long millisecondsForASecond = 1000;
        long tGap = t2 - t1 > millisecondsForASecond ? t2 - t1 : millisecondsForASecond;
        double gradient = 0.0;
        if (tGap > 0)

        { gradient = ((lastVal - firstVal) * millisecondsForASecond) / tGap; }

        t1: 1415213232152 t2: 1415213232157 firstVal: 12 lastVall: 7
        gradient: -5.0

        According to an online gradient calculator:
        gradient: -1
        http://www.calculator.net/slope-calculator.html?type=1&x11=1415213232152&y11=12&x12=1415213232157&y12=7&x=27&y=19

        According to the online gradient calculator (assuming their calculation is correct), the calculation in your fix is not correct. I believe the logic should be simple as follows:

        long tGap = t2 - t1;
        double gradient = 0.0;
        if (tGap > 0)

        { gradient = ((lastVal - firstVal)) / tGap; }

        t1: 1415213232152 t2: 1415213232157 firstVal: 12 lastVall: 7
        gradient: -1.0

        Thanks

        Show
        imesh Imesh Gunaratne added a comment - Hi Nirmal, I do not think the milliseconds to seconds convertion is correct here. As I see we are taking the difference of two timestamp values and then dividing it by 1000. The corect way might be to first divide each value by 1000 and then take the difference. On the other hand we might not need to convert these values to seconds since we are taking a time difference and calculating a gradient. I did a quick test with the following sample: Gradient: -999.9999999999998 Last val: 7.000000000000001 First val: 12.0 Time Gap: 5 t1: 1415213232152 t2: 1415213232157 According previous code: long tGap = t2 - t1; double gradient = 0.0; if (tGap > 0) { gradient = ((lastVal - firstVal) * 1000) / tGap; } t1: 1415213232152 t2: 1415213232157 firstVal: 12 lastVall: 7 gradient: -1000.0 According to your fix: long millisecondsForASecond = 1000; long tGap = t2 - t1 > millisecondsForASecond ? t2 - t1 : millisecondsForASecond; double gradient = 0.0; if (tGap > 0) { gradient = ((lastVal - firstVal) * millisecondsForASecond) / tGap; } t1: 1415213232152 t2: 1415213232157 firstVal: 12 lastVall: 7 gradient: -5.0 According to an online gradient calculator: gradient: -1 http://www.calculator.net/slope-calculator.html?type=1&x11=1415213232152&y11=12&x12=1415213232157&y12=7&x=27&y=19 According to the online gradient calculator (assuming their calculation is correct), the calculation in your fix is not correct. I believe the logic should be simple as follows: long tGap = t2 - t1; double gradient = 0.0; if (tGap > 0) { gradient = ((lastVal - firstVal)) / tGap; } t1: 1415213232152 t2: 1415213232157 firstVal: 12 lastVall: 7 gradient: -1.0 Thanks
        Hide
        nirmal Nirmal Fernando added a comment -

        (5-1)/2 = 5/2 - 1/2 is the Math I know

        It's a time difference, so the value still matters, if it's division of same units, then it can be neglected.
        This is not per second value.

        So, what I've done here is to put a reasonable constraint on the events saying, two events should be at least 1s apart from each other before doing gradient (per second) calculation.

        And this is the tangent value we're talking about and tangent can span from 0 to (+/-)infinity.

        Show
        nirmal Nirmal Fernando added a comment - (5-1)/2 = 5/2 - 1/2 is the Math I know It's a time difference, so the value still matters, if it's division of same units, then it can be neglected. This is not per second value. So, what I've done here is to put a reasonable constraint on the events saying, two events should be at least 1s apart from each other before doing gradient (per second) calculation. And this is the tangent value we're talking about and tangent can span from 0 to (+/-)infinity.
        Hide
        imesh Imesh Gunaratne added a comment -

        Hi Nirmal,
        According to the discussion we had in dev list, the solution you have provided here seems to be not valid.

        As I understood the root cause of this problem is as follows:

        • Currently the gradient calculation is done using statistics sent by different members of a cluster. IMO this is not correct. Mixing statistics values sent by different sources might not be correct.
        • Imagine there are three instances (m1, m2, m3) in a cluster where each report statistics values as 10, 50 and 80 continuously for tn time period:
          t1(10, 50, 80),t2(10, 50, 80),t3(10, 50, 80),tn(10, 50, 80)
        • Now if we calculate the gradient at a point of time using values sent by m1 and m3 (10 and 80) we will find a high gradient value. However at this situation gradient is zero at each member.

        Thanks
        Imesh

        Show
        imesh Imesh Gunaratne added a comment - Hi Nirmal, According to the discussion we had in dev list, the solution you have provided here seems to be not valid. As I understood the root cause of this problem is as follows: Currently the gradient calculation is done using statistics sent by different members of a cluster. IMO this is not correct. Mixing statistics values sent by different sources might not be correct. Imagine there are three instances (m1, m2, m3) in a cluster where each report statistics values as 10, 50 and 80 continuously for tn time period: t1(10, 50, 80),t2(10, 50, 80),t3(10, 50, 80),tn(10, 50, 80) Now if we calculate the gradient at a point of time using values sent by m1 and m3 (10 and 80) we will find a high gradient value. However at this situation gradient is zero at each member. Thanks Imesh

          People

          • Assignee:
            nirmal Nirmal Fernando
            Reporter:
            nirmal Nirmal Fernando
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development