Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-1303

Add 1-hot encoding to dependent variable in mini-batch preprocessor for images

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • v1.16
    • Module: Utilities
    • None

    Description

      Story

      As a data scientist, I want to have the mini-batch preprocessor 1-hot encode the dependent variable so that I don't need to do it myself. This applies to all types: boolean and character types such as text, char and varchar, & integers and floats.

      If the dependent variable is already an array, then we assume it is already 1-hot encoded and we just cast it to int[] and pass it along.

      We can remove the param `dependent_offset (optional)` from the current interface since 1-hot encoding is the more general solution.

      Open questions

      1) Q: Can we just use the exact same 1-hot encoding as in
      http://madlib.apache.org/docs/latest/group__grp__minibatch__preprocessing.html
      ???
      i.e., add the param `one_hot_encode_int_dep_var (optional)`
      then we could use the same code that is already written and tested and such?

      A: we can re-use the code to the extent possible, but we do not need this param.

      2) Q: In the case where the dependent variable is already 1-hot encoded, this means need to support array input for dependent variable. Also, should we just pass it thru or check for an array only with 1's and 0's?

      A: We will check first row but it does not guarantee all rows are correct.

      3) Q: How to handle float? If user wants to encode float values for some reason, they could cast them to text first. Or just pass them along?

      A: If scalar float, we 1-hot encode (could be a valid case). If float[], we cast to int[].

      Attachments

        Activity

          People

            Unassigned Unassigned
            fmcquillan Frank McQuillan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: