Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.9, 6.0
    • Fix Version/s: 4.9, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Today the primary key lookup in lucene is not that great for systems like solr and elasticsearch that have versioning in front of IndexWriter.

      To some extend BlockTree can "sometimes" help avoid seeks by telling you the term does not exist for a segment. But this technique (based on FST prefix) is fragile. The only other choice today is bloom filters, which use up huge amounts of memory.

      I don't think we are using everything we know: particularly the version semantics.

      Instead, if the FST for the terms index used an algebra that represents the max version for any subtree, we might be able to answer that there is no term T with version < V in that segment very efficiently.

      Also ID fields dont need postings lists, they dont need stats like docfreq/totaltermfreq, etc this stuff is all implicit.

      As far as API, i think for users to provide "IDs with versions" to such a PF, a start would to set a payload or whatever on the term field to get it thru indexwriter to the codec. And a "consumer" of the codec can just cast the Terms to a subclass that exposes the FST to do this version check efficiently.

      1. LUCENE-5675.patch
        551 kB
        Michael McCandless

        Activity

        Hide
        ASF subversion and git services added a comment -

        Commit 1594960 from Robert Muir in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1594960 ]

        LUCENE-5675: create branch for playing around

        Show
        ASF subversion and git services added a comment - Commit 1594960 from Robert Muir in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1594960 ] LUCENE-5675 : create branch for playing around
        Hide
        Michael McCandless added a comment -

        +1

        Show
        Michael McCandless added a comment - +1
        Hide
        ASF subversion and git services added a comment -

        Commit 1594971 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1594971 ]

        LUCENE-5675: initial scaffolding for new IDVPF

        Show
        ASF subversion and git services added a comment - Commit 1594971 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1594971 ] LUCENE-5675 : initial scaffolding for new IDVPF
        Hide
        ASF subversion and git services added a comment -

        Commit 1594985 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1594985 ]

        LUCENE-5675: add docs/AndPositionsEnums

        Show
        ASF subversion and git services added a comment - Commit 1594985 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1594985 ] LUCENE-5675 : add docs/AndPositionsEnums
        Hide
        ASF subversion and git services added a comment -

        Commit 1594991 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1594991 ]

        LUCENE-5675: move BlockTree* under its own package

        Show
        ASF subversion and git services added a comment - Commit 1594991 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1594991 ] LUCENE-5675 : move BlockTree* under its own package
        Hide
        ASF subversion and git services added a comment -

        Commit 1595006 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1595006 ]

        LUCENE-5675: pull out FieldReader from BTTR

        Show
        ASF subversion and git services added a comment - Commit 1595006 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1595006 ] LUCENE-5675 : pull out FieldReader from BTTR
        Hide
        ASF subversion and git services added a comment -

        Commit 1595007 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1595007 ]

        LUCENE-5675: pull out FieldReader from BTTR

        Show
        ASF subversion and git services added a comment - Commit 1595007 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1595007 ] LUCENE-5675 : pull out FieldReader from BTTR
        Hide
        ASF subversion and git services added a comment -

        Commit 1595013 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1595013 ]

        LUCENE-5675: pull out IntersectEnum

        Show
        ASF subversion and git services added a comment - Commit 1595013 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1595013 ] LUCENE-5675 : pull out IntersectEnum
        Hide
        ASF subversion and git services added a comment -

        Commit 1595017 from Robert Muir in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1595017 ]

        LUCENE-5675: fix javadocs

        Show
        ASF subversion and git services added a comment - Commit 1595017 from Robert Muir in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1595017 ] LUCENE-5675 : fix javadocs
        Hide
        ASF subversion and git services added a comment -

        Commit 1595020 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1595020 ]

        LUCENE-5675: more rote factoring

        Show
        ASF subversion and git services added a comment - Commit 1595020 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1595020 ] LUCENE-5675 : more rote factoring
        Hide
        ASF subversion and git services added a comment -

        Commit 1595025 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1595025 ]

        LUCENE-5675: break out SegmentTermsEnum.Frame

        Show
        ASF subversion and git services added a comment - Commit 1595025 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1595025 ] LUCENE-5675 : break out SegmentTermsEnum.Frame
        Hide
        ASF subversion and git services added a comment -

        Commit 1595026 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1595026 ]

        LUCENE-5675: rename

        Show
        ASF subversion and git services added a comment - Commit 1595026 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1595026 ] LUCENE-5675 : rename
        Hide
        ASF subversion and git services added a comment -

        Commit 1595027 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1595027 ]

        LUCENE-5675: break out IntersectTermsEnumFrame

        Show
        ASF subversion and git services added a comment - Commit 1595027 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1595027 ] LUCENE-5675 : break out IntersectTermsEnumFrame
        Hide
        ASF subversion and git services added a comment -

        Commit 1595052 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1595052 ]

        LUCENE-5675: small cleanups

        Show
        ASF subversion and git services added a comment - Commit 1595052 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1595052 ] LUCENE-5675 : small cleanups
        Hide
        ASF subversion and git services added a comment -

        Commit 1595064 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1595064 ]

        LUCENE-5675: initial fork of BT with versioning added

        Show
        ASF subversion and git services added a comment - Commit 1595064 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1595064 ] LUCENE-5675 : initial fork of BT with versioning added
        Hide
        ASF subversion and git services added a comment -

        Commit 1595229 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1595229 ]

        LUCENE-5675: add testRandom; sometimes fails

        Show
        ASF subversion and git services added a comment - Commit 1595229 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1595229 ] LUCENE-5675 : add testRandom; sometimes fails
        Hide
        ASF subversion and git services added a comment -

        Commit 1595530 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1595530 ]

        LUCENE-5675: checkpoint current dirty state

        Show
        ASF subversion and git services added a comment - Commit 1595530 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1595530 ] LUCENE-5675 : checkpoint current dirty state
        Hide
        ASF subversion and git services added a comment -

        Commit 1595548 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1595548 ]

        LUCENE-5675: testRandom seems to be passing

        Show
        ASF subversion and git services added a comment - Commit 1595548 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1595548 ] LUCENE-5675 : testRandom seems to be passing
        Hide
        ASF subversion and git services added a comment -

        Commit 1595817 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1595817 ]

        LUCENE-5675: detect negative versions, fix another seekExact case

        Show
        ASF subversion and git services added a comment - Commit 1595817 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1595817 ] LUCENE-5675 : detect negative versions, fix another seekExact case
        Hide
        ASF subversion and git services added a comment -

        Commit 1595824 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1595824 ]

        LUCENE-5675: merge trunk

        Show
        ASF subversion and git services added a comment - Commit 1595824 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1595824 ] LUCENE-5675 : merge trunk
        Hide
        ASF subversion and git services added a comment -

        Commit 1596091 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1596091 ]

        LUCENE-5675: delete docs on flush

        Show
        ASF subversion and git services added a comment - Commit 1596091 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1596091 ] LUCENE-5675 : delete docs on flush
        Hide
        ASF subversion and git services added a comment -

        Commit 1596512 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1596512 ]

        LUCENE-5675: fix nocommits

        Show
        ASF subversion and git services added a comment - Commit 1596512 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1596512 ] LUCENE-5675 : fix nocommits
        Hide
        ASF subversion and git services added a comment -

        Commit 1596599 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1596599 ]

        LUCENE-5675: go back to sending deleted docs to PostingsFormat on flush; move 'skip deleted docs' into IDVPF

        Show
        ASF subversion and git services added a comment - Commit 1596599 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1596599 ] LUCENE-5675 : go back to sending deleted docs to PostingsFormat on flush; move 'skip deleted docs' into IDVPF
        Hide
        ASF subversion and git services added a comment -

        Commit 1596602 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1596602 ]

        LUCENE-5675: finish reverting 'do not send deleted docs to PostingsFormat on flush'

        Show
        ASF subversion and git services added a comment - Commit 1596602 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1596602 ] LUCENE-5675 : finish reverting 'do not send deleted docs to PostingsFormat on flush'
        Hide
        ASF subversion and git services added a comment -

        Commit 1596708 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1596708 ]

        LUCENE-5675: working on ant precommit

        Show
        ASF subversion and git services added a comment - Commit 1596708 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1596708 ] LUCENE-5675 : working on ant precommit
        Hide
        ASF subversion and git services added a comment -

        Commit 1596783 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1596783 ]

        LUCENE-5693, LUCENE-5675: also decouple this bug fix (move to LUCENE-5693) in ToParentBJQ.explain

        Show
        ASF subversion and git services added a comment - Commit 1596783 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1596783 ] LUCENE-5693 , LUCENE-5675 : also decouple this bug fix (move to LUCENE-5693 ) in ToParentBJQ.explain
        Hide
        ASF subversion and git services added a comment -

        Commit 1596817 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1596817 ]

        LUCENE-5675: merge trunk

        Show
        ASF subversion and git services added a comment - Commit 1596817 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1596817 ] LUCENE-5675 : merge trunk
        Hide
        ASF subversion and git services added a comment -

        Commit 1596938 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1596938 ]

        LUCENE-5675, LUCENE-5693: improve javadocs, disallow term vectors, fix precommit issues, remove trivial diffs, add new test case

        Show
        ASF subversion and git services added a comment - Commit 1596938 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1596938 ] LUCENE-5675 , LUCENE-5693 : improve javadocs, disallow term vectors, fix precommit issues, remove trivial diffs, add new test case
        Hide
        Michael McCandless added a comment -

        Here's an applyable patch (created with diffSources.py).

        The patch is very large because 1) I split BlockTree* into separate classes under its own package, and 2) I forked most of BlockTree* for the new IDVersionPF.

        Tests seem to pass; I think it's ready.

        Show
        Michael McCandless added a comment - Here's an applyable patch (created with diffSources.py). The patch is very large because 1) I split BlockTree* into separate classes under its own package, and 2) I forked most of BlockTree* for the new IDVersionPF. Tests seem to pass; I think it's ready.
        Hide
        Michael McCandless added a comment -

        I'll move the new IDVPF to sandbox before committing.

        Show
        Michael McCandless added a comment - I'll move the new IDVPF to sandbox before committing.
        Hide
        ASF subversion and git services added a comment -

        Commit 1596946 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1596946 ]

        LUCENE-5675: move to sandbox

        Show
        ASF subversion and git services added a comment - Commit 1596946 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1596946 ] LUCENE-5675 : move to sandbox
        Hide
        Robert Muir added a comment -

        +1 for sandbox as a start

        Show
        Robert Muir added a comment - +1 for sandbox as a start
        Hide
        ASF subversion and git services added a comment -

        Commit 1596974 from Michael McCandless in branch 'dev/branches/lucene5675'
        [ https://svn.apache.org/r1596974 ]

        LUCENE-5675: zig-zag encode the versions (loses 1 bit); check the min/max version

        Show
        ASF subversion and git services added a comment - Commit 1596974 from Michael McCandless in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1596974 ] LUCENE-5675 : zig-zag encode the versions (loses 1 bit); check the min/max version
        Hide
        ASF subversion and git services added a comment -

        Commit 1596979 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1596979 ]

        LUCENE-5675: add IDVersionPostingsFormat

        Show
        ASF subversion and git services added a comment - Commit 1596979 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1596979 ] LUCENE-5675 : add IDVersionPostingsFormat
        Hide
        ASF subversion and git services added a comment -

        Commit 1597030 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1597030 ]

        LUCENE-5675: add IDVersionPostingsFormat

        Show
        ASF subversion and git services added a comment - Commit 1597030 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1597030 ] LUCENE-5675 : add IDVersionPostingsFormat
        Hide
        ASF subversion and git services added a comment -

        Commit 1597695 from Steve Rowe in branch 'dev/trunk'
        [ https://svn.apache.org/r1597695 ]

        LUCENE-5675: Add src/resources directory to maven config

        Show
        ASF subversion and git services added a comment - Commit 1597695 from Steve Rowe in branch 'dev/trunk' [ https://svn.apache.org/r1597695 ] LUCENE-5675 : Add src/resources directory to maven config
        Hide
        ASF subversion and git services added a comment -

        Commit 1597696 from Steve Rowe in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1597696 ]

        LUCENE-5675: Add src/resources directory to maven config (merged trunk r1597695)

        Show
        ASF subversion and git services added a comment - Commit 1597696 from Steve Rowe in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1597696 ] LUCENE-5675 : Add src/resources directory to maven config (merged trunk r1597695)
        Hide
        Michael McCandless added a comment -

        Thanks Steve!

        Show
        Michael McCandless added a comment - Thanks Steve!
        Hide
        ASF subversion and git services added a comment -

        Commit 1610415 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1610415 ]

        LUCENE-5675: make VersionBlockTreeWriter/Reader public

        Show
        ASF subversion and git services added a comment - Commit 1610415 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1610415 ] LUCENE-5675 : make VersionBlockTreeWriter/Reader public
        Hide
        ASF subversion and git services added a comment -

        Commit 1610416 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1610416 ]

        LUCENE-5675: make VersionBlockTreeWriter/Reader public

        Show
        ASF subversion and git services added a comment - Commit 1610416 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1610416 ] LUCENE-5675 : make VersionBlockTreeWriter/Reader public

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development