Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-1300

Clarify dep and indep var column names in output table for deep learning minibatch preprocessor

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • v1.16
    • Module: Utilities
    • None

    Description

      Follow on to this commit:
      Minibatch Preprocessor for Deep learning
      https://github.com/apache/madlib/commit/8de32ede33c48d2f4a440f0f639c94a277a359c1

      The output table produced by the deep mini-batch preprocessor contains the following columns:

      ...
      dependent_varname	FLOAT8[]. Packed array of dependent variables. If the dependent variable in the source table is categorical, the preprocessor will one-hot encode it.
      independent_varname	FLOAT8[]. Packed array of independent variables.
      ...
      

      This is misleading because these columns contain values not names, so we should rename these columns to:

      ...
      dependent_var
      independent_var
      ...
      

      The output summary table contains the following columns:

      dependent_varname	Dependent variable from the source table.
      independent_varname	Independent variable from the source table.
      

      This is OK since the columns actually do contain names.

      There is a related 2.0 story for the regular mini-batch preprocessor
      http://madlib.apache.org/docs/latest/group__grp__minibatch__preprocessing.html
      in JIRA https://issues.apache.org/jira/browse/MADLIB-1294 which we don't want to do in 1.16 since it will break semantic versioning

      Attachments

        Activity

          People

            hpandey Himanshu Pandey
            fmcquillan Frank McQuillan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: