Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.9
-
None
-
None
-
None
-
Docs Required, Release Notes Required
Description
The behavior of this method ignores possible feature choosing on the previous levels and we have no ability to make feature engineering during the preprocessing like simple sql: filter, exclude, produce new features and so on
public SimpleDatasetData build(
LearningEnvironment env,
Iterator<UpstreamEntry<K, V>> upstreamData, long upstreamDataSize, C ctx) {
// Prepares the matrix of features in flat column-major format.
int cols = -1;
double[] features = null;
int ptr = 0;
while (upstreamData.hasNext()) {
UpstreamEntry<K, V> entry = upstreamData.next();
Vector row = preprocessor.apply(entry.getKey(), entry.getValue()).features();
if (cols < 0)
{ cols = row.size(); features = new double[Math.toIntExact(upstreamDataSize * cols)]; } else
assert row.size() == cols : "Feature extractor must return exactly " + cols + " features";
for (int i = 0; i < cols; i++)
features[Math.toIntExact(i * upstreamDataSize + ptr)] = row.get;
ptr++;
}
return new SimpleDatasetData(features, Math.toIntExact(upstreamDataSize));
}