Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-1245

Randomize data after standardization

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Module: Utilities
    • None

    Description

      The functions `utils_ind_var_scales` and  `utils_ind_var_scales_grouping` in `convex.utils_regularization` are used to standardize the input data, which is then fed to the underlying gradient descent solver. Most often, randomizing the data works well with gradient descent.
      The current functions create a temp table consisting of the standardized version of the input data, but the rows are not randomly distributed. Can we distribute it randomly? This might affect multiple modules, so all those affected modules must be tested well to ensure this change is acceptable.

      Attachments

        Activity

          People

            Unassigned Unassigned
            njayaram Nandish Jayaram
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: