Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4299

No way to find term vectors options at read time

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.0, 6.0
    • None
    • None
    • New

    Description

      The problem is simple:

      1. term vectors can be configured "per-field-per-document", meaning for the "body" field, document 0 can have them, document 1 maybe doesnt at all, document 2 maybe has offsets (no positions), and so on. To me this is not a useful feature at all, no one has ever mentioned a single use case for this, and it just makes our code more complicated. but it is what it is (for this issue)
      2. there is no way to discover these options for a field of a document, you have to do things like 'peek ahead' to see the first position of the first term is -1, or same for offsets (except worse, we used to allow anything in offsets so -1 might be an actual value). This makes the merging code really hairy, and tough on end consumers.

      So I propose that instead of returning Terms for Vectors, we return VectorTerms (extends Terms), which just adds hasOffsets() and hasPositions(). e.g. lucene40 already knows this from the bits for the field/doc pair and just returns what it knows.

      Attachments

        1. LUCENE-4299.patch
          15 kB
          Robert Muir
        2. LUCENE-4299.patch
          14 kB
          Robert Muir
        3. LUCENE-4299.patch
          18 kB
          Robert Muir
        4. LUCENE-4299.patch
          19 kB
          Robert Muir
        5. LUCENE-4299.patch
          22 kB
          Robert Muir

        Activity

          People

            Unassigned Unassigned
            rcmuir Robert Muir
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: