Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-12257

[ML] Add Feature Filter for ML Partitioned Dataset

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.9
    • None
    • None
    • None
    • Docs Required, Release Notes Required

    Description

      The behavior of this method ignores possible feature choosing on the previous levels and we have no ability to make feature engineering during the preprocessing like simple sql: filter, exclude, produce new features and so on

       

       

      public SimpleDatasetData build(
      LearningEnvironment env,
      Iterator<UpstreamEntry<K, V>> upstreamData, long upstreamDataSize, C ctx) {
      // Prepares the matrix of features in flat column-major format.
      int cols = -1;
      double[] features = null;

      int ptr = 0;
      while (upstreamData.hasNext()) {
      UpstreamEntry<K, V> entry = upstreamData.next();
      Vector row = preprocessor.apply(entry.getKey(), entry.getValue()).features();

      if (cols < 0)

      { cols = row.size(); features = new double[Math.toIntExact(upstreamDataSize * cols)]; }

      else
      assert row.size() == cols : "Feature extractor must return exactly " + cols + " features";

      for (int i = 0; i < cols; i++)
      features[Math.toIntExact(i * upstreamDataSize + ptr)] = row.get;

      ptr++;
      }

      return new SimpleDatasetData(features, Math.toIntExact(upstreamDataSize));
      }

      Attachments

        Activity

          People

            zaleslaw Alexey Zinoviev
            zaleslaw Alexey Zinoviev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: