Uploaded image for project: 'Hivemall'
  1. Hivemall
  2. HIVEMALL-284

Support class weighting in GeneralLearnerBase

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • 0.7.0
    • None

    Description

      https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit
      https://scikit-learn.org/dev/glossary.html#term-class-weight

      Introduce "-class_weight=[0.1,0.2]" or "-pos_weight=0.2 -neg_weight=0.1" option.

      https://github.com/scikit-learn/scikit-learn/blob/0a7adef0058ef28c7a146734f38161f7c7c581af/sklearn/linear_model/_sgd_fast.pyx#L719

      class_weight is computed in scikit as follows:
      > class_weight_y = #samples / (#classes * count_of)

      In SQL, it can be computed in SQL as follows: 

      -- For binary classification (#classes = 2)
      WITH weights as (
       select
        count(1) / 2 * sum(if(label=0, 1, 0) as neg_weight,
        count(1) / 2 * sum(if(label=1, 1, 0) as pos_weight
       from
        train
      )
      select
        train_classifier(features, label, concat('-pos_weight=', pos_weight, ' -neg_weight=', neg_weight)
      from
        train l
         cross join weights r

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            myui Makoto Yui
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: