Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 8.0
    • core/index
    • None
    • New

    Description

      Subtask of LUCENE-4100.

      Thats an example of something similar to impact indexing (though, his implementation currently stores a max for the entire term, the problem is the same).

      We can imagine other similar algorithms too: I think the codec API should be able to support these.

      Currently it really doesnt: Stefan worked around the problem by providing a tool to 'rewrite' your index, he passes the IndexReader and Similarity to it. But it would be better if we fixed the codec API.

      One problem is that the Postings writer needs to have access to the Similarity. Another problem is that it needs access to the term and collection statistics up front, rather than after the fact.

      This might have some cost (hopefully minimal), so I'm thinking to experiment in a branch with these changes and see if we can make it work well.

      Attachments

        1. LUCENE-4198_flush.patch
          11 kB
          Robert Muir
        2. LUCENE-4198.patch
          181 kB
          Adrien Grand
        3. LUCENE-4198.patch
          180 kB
          Adrien Grand
        4. LUCENE-4198.patch
          143 kB
          Adrien Grand
        5. LUCENE-4198.patch
          101 kB
          Adrien Grand
        6. LUCENE-4198.patch
          252 kB
          Adrien Grand
        7. LUCENE-4198-BMW.patch
          65 kB
          Adrien Grand
        8. TestSimpleTextPostingsFormat.asf.nightly.master.1466.consoleText.excerpt.txt
          108 kB
          Steven Rowe
        9. TestSimpleTextPostingsFormat.sarowe.jenkins.nightly.master.681.consoleText.excerpt.txt
          108 kB
          Steven Rowe

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rcmuir Robert Muir
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h