[MADLIB-1300] Clarify dep and indep var column names in output table for deep learning minibatch preprocessor - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: v1.16
Component/s: Module: Utilities
Labels:
None

Description

Follow on to this commit:
Minibatch Preprocessor for Deep learning
https://github.com/apache/madlib/commit/8de32ede33c48d2f4a440f0f639c94a277a359c1

The output table produced by the deep mini-batch preprocessor contains the following columns:

...
dependent_varname	FLOAT8[]. Packed array of dependent variables. If the dependent variable in the source table is categorical, the preprocessor will one-hot encode it.
independent_varname	FLOAT8[]. Packed array of independent variables.
...

This is misleading because these columns contain values not names, so we should rename these columns to:

...
dependent_var
independent_var
...

The output summary table contains the following columns:

dependent_varname	Dependent variable from the source table.
independent_varname	Independent variable from the source table.

This is OK since the columns actually do contain names.

There is a related 2.0 story for the regular mini-batch preprocessor
http://madlib.apache.org/docs/latest/group__grp__minibatch__preprocessing.html
in JIRA https://issues.apache.org/jira/browse/MADLIB-1294 which we don't want to do in 1.16 since it will break semantic versioning

Attachments

Activity

People

Assignee:: Himanshu Pandey

Reporter:: Frank McQuillan

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 07/Feb/19 19:37

Updated:: 13/Feb/19 22:11

Resolved:: 13/Feb/19 22:11