Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-12685

[ML] [Umbrella] Unify Preprocessors and Pipeline approaches to collect common statistics

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • ml
    • None
    • Docs Required, Release Notes Required

    Description

      In the current implementation we have different behavior in Cross-Validation during running on the experimental Pipeline and chain of Preprocessors.

       

      Look at the tutorial step 8 CV_Param_Grid and 8_CV_Param_Grid_and_pipeline

      In the first example all preprocessors fits on the whole dataset and don't use train/test filter (due to limited API in preprocessors), and collects the stat on the whole initial dataset.

       

      In the second example, we have honest re-fitting on each cross-validation fold three times with three different stats. As a result we could get a different encoding values or Max/Min values for each column and so on.

       

      Should learn this question and be in consistency with the most popular approaches.

       

      Attachments

        Activity

          People

            zaleslaw Alexey Zinoviev
            zaleslaw Alexey Zinoviev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: