Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7591

Let DatasetSplitter approximate no. of class values by no. of terms

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 7.0
    • modules/classification
    • None
    • New

    Description

      Currently DatasetSplitter throws an exception if it's not possible to find SortedDocValues or SortedSetDocValues on the class field as it wouldn't be possible to correctly split the indexes in a balanced way.
      As a fallback we could instead use the no. of terms per leaf reader as an approximate count (upper bound) to the no. of classes.

      Attachments

        Activity

          People

            teofili Tommaso Teofili
            teofili Tommaso Teofili
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: