Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7591

Let DatasetSplitter approximate no. of class values by no. of terms

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0
    • Component/s: modules/classification
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Currently DatasetSplitter throws an exception if it's not possible to find SortedDocValues or SortedSetDocValues on the class field as it wouldn't be possible to correctly split the indexes in a balanced way.
      As a fallback we could instead use the no. of terms per leaf reader as an approximate count (upper bound) to the no. of classes.

        Attachments

          Activity

            People

            • Assignee:
              teofili Tommaso Teofili
              Reporter:
              teofili Tommaso Teofili

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment