Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13903

Classification Model Confusion Matrix Discrepancy

    XMLWordPrintableJSON

    Details

      Description

      Using features and train stream sources generate a model with TP, TN, FP, FN fields. For some reason, the summation of the values of these fields is sometimes less than the training set size.

       How to regenerate:

      1. Create two collections: cellphones and cellphones-model

      2. Indexing the attached dataset into cellphones

      3. Run the following expression:

      commit(cellphones-model,update(cellphones-model,batchSize=500,}}
         train(cellphones,
           features(cellphones, q=":", featureSet="featureSet",
       field="title_t",
       outcome="brand_i", numTerms=25),
       q=":",
       name="cellphones-classification-model",
       field="title_t",
       outcome="brand_i",
       maxIterations=100)))

      4. Run the following query to retrieve confusion matrix:

      search q=:&collection=cellphones-model&fl=name_s,trueNegative_i,truePositive_i,falseNegative_i,falsePositive_i,iteration_i&sort=iteration_i%20desc&rows=100

      The summation of the metrics TP, TN, FP, FN is always less than the training set size by one in this instance for all iterations.

        Attachments

        1. cellphones.csv
          7 kB
          Ahmed Adel

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ahmed.adel Ahmed Adel
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: