Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-1460

Prevent an "integer out of range" exception in linear regression train

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • v1.18.0
    • None

    Description

      Linear regression training results in 2 output tables (neither are optional): 

      • The primary output table, that includes the computed coefficients.
      • A summary output table, that contains a single line.

      Scenario

      Running the linear regression training in postgresql on an input table which has more than 2^31 records within it (even if a grouping column is specified), fails due to an "integer out of range" exception.

      Source

      The summary table has a column that stores the total number of records involved in the computation. The column's data type is a singed integer. However, the total number of records is computed as a BIGINT. Therefore, when the total number of records in the input table is beyond the range of a signed integer (i.e., 2^31), an "integer out of range" exception is thrown.

      Solution

      A simple solution is to change the data type of the column from a signed integer into a BIGINT

      Test

      We have executed the linear regression training function with and without the suggested modification on an input table having between 2^31-2^32 records. Without the modification, an integer out of range exception was thrown. After modifying the code as suggested, it worked perfectly. 

       

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            dadanielniel Daniel Daniel
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: