Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
8.2
-
None
Description
Using features and train stream sources generate a model with TP, TN, FP, FN fields. For some reason, the summation of the values of these fields is sometimes less than the training set size.
How to regenerate:
1. Create two collections: cellphones and cellphones-model
2. Indexing the attached dataset into cellphones
3. Run the following expression:
commit(cellphones-model,update(cellphones-model,batchSize=500,}}
train(cellphones,
features(cellphones, q=":", featureSet="featureSet",
field="title_t",
outcome="brand_i", numTerms=25),
q=":",
name="cellphones-classification-model",
field="title_t",
outcome="brand_i",
maxIterations=100)))
4. Run the following query to retrieve confusion matrix:
search q=:&collection=cellphones-model&fl=name_s,trueNegative_i,truePositive_i,falseNegative_i,falsePositive_i,iteration_i&sort=iteration_i%20desc&rows=100
The summation of the metrics TP, TN, FP, FN is always less than the training set size by one in this instance for all iterations.