[IGNITE-12257] [ML] Add Feature Filter for ML Partitioned Dataset - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.9
Fix Version/s: None
Component/s: None
Labels:
None

Ignite Flags:

Docs Required, Release Notes Required

Description

The behavior of this method ignores possible feature choosing on the previous levels and we have no ability to make feature engineering during the preprocessing like simple sql: filter, exclude, produce new features and so on

public SimpleDatasetData build(
LearningEnvironment env,
Iterator<UpstreamEntry<K, V>> upstreamData, long upstreamDataSize, C ctx) {
// Prepares the matrix of features in flat column-major format.
int cols = -1;
double[] features = null;

int ptr = 0;
while (upstreamData.hasNext()) {
UpstreamEntry<K, V> entry = upstreamData.next();
Vector row = preprocessor.apply(entry.getKey(), entry.getValue()).features();

if (cols < 0)

{ cols = row.size(); features = new double[Math.toIntExact(upstreamDataSize * cols)]; }

else
assert row.size() == cols : "Feature extractor must return exactly " + cols + " features";

for (int i = 0; i < cols; i++)
features[Math.toIntExact(i * upstreamDataSize + ptr)] = row.get;

ptr++;
}

return new SimpleDatasetData(features, Math.toIntExact(upstreamDataSize));
}

Attachments

Activity

People

Assignee:: Alexey Zinoviev

Reporter:: Alexey Zinoviev

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 03/Oct/19 11:22

Updated:: 03/Jul/20 07:01