Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-896

PivotalR test failures indicate potential bugs in MADlib GLM

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      These problems may be just numerical issues with too large the condition numbers or too small of a training set. To be investigated.

      > PivotalR:::test(filter="glm")
      
      Running tests -------------------------
      Test cases for madlib.glm and its helper functions : 
      .port? 5431
      .dbname? madlib-pg93
      ....................
      WARNING:  GLM warning: the computation did not converge in 20 iterations!
      CONTEXT:  PL/Python function "glm"
      1.2.................
      WARNING:  GLM warning: the computation did not converge in 20 iterations!
      CONTEXT:  PL/Python function "glm"
      ....
      WARNING:  Hessian or gradient is not finite.
      CONTEXT:  SQL statement "
                  SELECT
                      __madlib_temp_75741577_1437438071_4375895__ AS __madlib_temp_75741577_1437438071_4375895__,
                      sex ,
                      4 AS __madlib_temp_48749745_1437438071_22969480__,
                      (
                      madlib.__glm_binomial_probit_agg(
                          ((("rings") < (10))::integer)::double precision,
                          (array[1,"length","diameter","height","whole","shucked","viscera","shell"])::double precision[],
                          __madlib_temp_43345218_1437438071_11277539__.__madlib_temp_69537766_1437438071_32811656__)
                      ) AS __madlib_temp_69537766_1437438071_32811656__
                  FROM
                  (
                      SELECT
                          *,
                          array_to_string(ARRAY[sex::text],
                                          ','
                                         ) AS __madlib_temp_75741577_1437438071_4375895__
                      FROM "pg_temp_3"."madlib_temp_d763e98a_0753_969a95_03cedf5694ab"
                  ) AS _src
                  JOIN
                  (
                      SELECT
                          unnest($1) AS __madlib_temp_75741577_1437438071_4375895__,
                          unnest($2) AS __madlib_temp_69537766_1437438071_32811656__
                  ) AS __madlib_temp_43345218_1437438071_11277539__
                  USING (__madlib_temp_75741577_1437438071_4375895__)
                  GROUP BY sex, __madlib_temp_75741577_1437438071_4375895__
                  "
      PL/Python function "glm"
      WARNING:  Hessian or gradient is not finite.
      CONTEXT:  SQL statement "
                  SELECT
                      __madlib_temp_75741577_1437438071_4375895__ AS __madlib_temp_75741577_1437438071_4375895__,
                      sex ,
                      5 AS __madlib_temp_48749745_1437438071_22969480__,
                      (
                      madlib.__glm_binomial_probit_agg(
                          ((("rings") < (10))::integer)::double precision,
                          (array[1,"length","diameter","height","whole","shucked","viscera","shell"])::double precision[],
                          __madlib_temp_43345218_1437438071_11277539__.__madlib_temp_69537766_1437438071_32811656__)
                      ) AS __madlib_temp_69537766_1437438071_32811656__
                  FROM
                  (
                      SELECT
                          *,
                          array_to_string(ARRAY[sex::text],
                                          ','
                                         ) AS __madlib_temp_75741577_1437438071_4375895__
                      FROM "pg_temp_3"."madlib_temp_d763e98a_0753_969a95_03cedf5694ab"
                  ) AS _src
                  JOIN
                  (
                      SELECT
                          unnest($1) AS __madlib_temp_75741577_1437438071_4375895__,
                          unnest($2) AS __madlib_temp_69537766_1437438071_32811656__
                  ) AS __madlib_temp_43345218_1437438071_11277539__
                  USING (__madlib_temp_75741577_1437438071_4375895__)
                  GROUP BY sex, __madlib_temp_75741577_1437438071_4375895__
                  "
      PL/Python function "glm"
      34..............5..........................
      
      1. Failure (at test-madlib_glm.r#78): Test gaussian(inverse) ------------------------------------------
      fit.db$coef not equal to fit.r$coefficients[, 1]
      8/8 mismatches (average diff: 0.00719).
      First 8:
       pos       x       y     diff
         1  0.1970  0.1990 -0.00196
         2 -0.0243 -0.0254  0.00112
         3 -0.1709 -0.1630 -0.00793
         4 -0.2059 -0.2462  0.04027
         5 -0.0476 -0.0465 -0.00112
         6  0.1413  0.1397  0.00156
         7  0.0564  0.0577 -0.00130
         8 -0.0146 -0.0123 -0.00222
      
      2. Failure (at test-madlib_glm.r#86): Test gaussian(inverse) with categorical features ----------------
      fit.db$coef not equal to fit.r$coefficients[, 1]
      10/10 mismatches (average diff: 0.00517).
      First 10:
       pos        x        y      diff
         1  0.18215  0.18410 -1.94e-03
         2  0.01223  0.01214  8.72e-05
         3 -0.00158 -0.00153 -4.83e-05
         4 -0.02981 -0.03107  1.26e-03
         5 -0.13631 -0.12955 -6.76e-03
         6 -0.19904 -0.23515  3.61e-02
         7 -0.04775 -0.04668 -1.07e-03
         8  0.14030  0.13905  1.26e-03
         9  0.06185  0.06311 -1.26e-03
        10 -0.01741 -0.01550 -1.91e-03
      
      3. Failure (at test-madlib_glm.r#154): Test binomial(probit) with grouping ----------------------------
      fit.db[[1]]$coef not equal to fit.r[[1]]$coefficients[, 1]
      8/8 mismatches (average diff: 3.43).
      First 8:
       pos      x      y   diff
         1   2.79   1.73  1.063
         2   5.41   5.73 -0.317
         3  -3.23  -1.48 -1.742
         4 -12.52  -9.37 -3.157
         5 -16.51 -11.62 -4.893
         6  21.90  16.00  5.899
         7  13.38   7.96  5.423
         8   2.33  -2.62  4.957
      
      4. Failure (at test-madlib_glm.r#155): Test binomial(probit) with grouping ----------------------------
      fit.db[[1]]$std_err not equal to fit.r[[1]]$coefficients[, 2]
      8/8 mismatches (average diff: Inf).
      First 8:
       pos     x   y diff
         1 0.582 Inf -Inf
         2 2.559 Inf -Inf
         3 3.334 Inf -Inf
         4 4.176 Inf -Inf
         5 2.934 Inf -Inf
         6 3.257 Inf -Inf
         7 3.928 Inf -Inf
         8 3.629 Inf -Inf
      
      5. Failure (at test-madlib_glm.r#214): Test poisson(identity) with grouping ---------------------------
      fit.db[[1]]$coef not equal to fit.r[[1]]$coefficients[, 1]
      8/8 mismatches (average diff: 0.13).
      First 8:
       pos     x     y     diff
         1  2.74  2.75 -0.00483
         2 -1.76 -1.78  0.02177
         3  5.83  5.81  0.02412
         4 27.36 27.45 -0.08863
         5  2.67  2.44  0.22605
         6 -7.71 -7.38 -0.32432
         7 -5.89 -5.72 -0.16966
         8 14.88 15.06 -0.17732
      Error: Test failures
      In addition: Warning messages:
      1: glm.fit: algorithm did not converge 
      2: glm.fit: algorithm did not converge 
      3: glm.fit: algorithm did not converge 
      4: glm.fit: fitted probabilities numerically 0 or 1 occurred 
      5: glm.fit: algorithm did not converge 
      6: glm.fit: fitted probabilities numerically 0 or 1 occurred 
      

      Attachments

        Issue Links

          Activity

            People

              riyer Rahul Iyer
              haying Xixuan (Aaron) Feng
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: