Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We want to be able to run multiple queries at once over a MemoryIndex in luwak (see https://github.com/flaxsearch/luwak/commit/49a8fba5764020c2f0e4dc29d80d93abb0231191), but this isn't possible with the current implementation. However, looking at the code, it seems that it would be relatively simple to make MemoryIndex thread-safe for reads/queries.

      1. LUCENE-5911.patch
        54 kB
        Alan Woodward
      2. LUCENE-5911.patch
        4 kB
        Alan Woodward
      3. LUCENE-5911-2.patch
        8 kB
        Alan Woodward

        Activity

        Hide
        Alan Woodward added a comment -

        Patch, no tests yet (MemoryIndex seems a bit under-tested at the moment!), just changing a couple of fields in Info to final and making Info.sortTerms() synchronized.

        Show
        Alan Woodward added a comment - Patch, no tests yet (MemoryIndex seems a bit under-tested at the moment!), just changing a couple of fields in Info to final and making Info.sortTerms() synchronized.
        Hide
        Alan Woodward added a comment -

        We're running some tests on this at the moment to see if adding in the synchronization slows things down noticeably. Anybody have any comments? Simon Willnauer or Martijn van Groningen?

        Show
        Alan Woodward added a comment - We're running some tests on this at the moment to see if adding in the synchronization slows things down noticeably. Anybody have any comments? Simon Willnauer or Martijn van Groningen ?
        Hide
        Simon Willnauer added a comment -

        I don't understand why you not just simply calling sortTerms before you publish the MemoryIndex for querying? no need to sync this here at all no?

        Show
        Simon Willnauer added a comment - I don't understand why you not just simply calling sortTerms before you publish the MemoryIndex for querying? no need to sync this here at all no?
        Hide
        Alan Woodward added a comment -

        Do you mean add a publish() method to MemoryIndex? There's no easy way of getting sortTerms() to run on all the fields outside of calling toString() at the moment, unless I'm missing something.

        Show
        Alan Woodward added a comment - Do you mean add a publish() method to MemoryIndex? There's no easy way of getting sortTerms() to run on all the fields outside of calling toString() at the moment, unless I'm missing something.
        Hide
        Simon Willnauer added a comment -

        I don't see how you can make this actually faster though... I don't think you can gain anything here really. I'd maybe add a method that allows you to sort all of them and use that before you pass the MemoryIndex to the search threads?

        Show
        Simon Willnauer added a comment - I don't see how you can make this actually faster though... I don't think you can gain anything here really. I'd maybe add a method that allows you to sort all of them and use that before you pass the MemoryIndex to the search threads?
        Hide
        Yonik Seeley added a comment -

        It feels like sortTerms is an implementation detail that should not be exposed to the user.
        +1 to this patch to make things thread-safe.

        Show
        Yonik Seeley added a comment - It feels like sortTerms is an implementation detail that should not be exposed to the user. +1 to this patch to make things thread-safe.
        Hide
        Alan Woodward added a comment -

        Adding synchronized is a bit brute-force though.

        The lazy-sorting bit is presumably for situations like highlighting where you want to index a single document and run only a couple of queries over it as quickly as possible. So maybe the answer is to have a derived class, ThreadSafeMemoryIndex, which sorts the terms in the Info constructor. That way if you want to use this in a lazily-instantiated way you continue to use MemoryIndex, but luwak (and the percolator) would use TSMI instead.

        I'll work on a patch.

        Show
        Alan Woodward added a comment - Adding synchronized is a bit brute-force though. The lazy-sorting bit is presumably for situations like highlighting where you want to index a single document and run only a couple of queries over it as quickly as possible. So maybe the answer is to have a derived class, ThreadSafeMemoryIndex, which sorts the terms in the Info constructor. That way if you want to use this in a lazily-instantiated way you continue to use MemoryIndex, but luwak (and the percolator) would use TSMI instead. I'll work on a patch.
        Hide
        Alan Woodward added a comment -

        In the end I decided that the nicest API change was to add a freeze() method which prepares the internal data structures for querying, and prevents users adding new fields. You can continue to use MemoryIndex as in the past, but if you want to search in multiple threads, call freeze() first.

        Also adds a new test, and renames the existing test to be a bit more descriptive of what it's actually doing.

        Show
        Alan Woodward added a comment - In the end I decided that the nicest API change was to add a freeze() method which prepares the internal data structures for querying, and prevents users adding new fields. You can continue to use MemoryIndex as in the past, but if you want to search in multiple threads, call freeze() first. Also adds a new test, and renames the existing test to be a bit more descriptive of what it's actually doing.
        Hide
        Simon Willnauer added a comment -

        +1 to freeze

        Show
        Simon Willnauer added a comment - +1 to freeze
        Hide
        ASF subversion and git services added a comment -

        Commit 1628154 from Alan Woodward in branch 'dev/trunk'
        [ https://svn.apache.org/r1628154 ]

        LUCENE-5911: Add MemoryIndex.freeze() to allow for thread-safe searching

        Show
        ASF subversion and git services added a comment - Commit 1628154 from Alan Woodward in branch 'dev/trunk' [ https://svn.apache.org/r1628154 ] LUCENE-5911 : Add MemoryIndex.freeze() to allow for thread-safe searching
        Hide
        ASF subversion and git services added a comment -

        Commit 1628155 from Alan Woodward in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1628155 ]

        LUCENE-5911: Add MemoryIndex.freeze() to allow for thread-safe searching

        Show
        ASF subversion and git services added a comment - Commit 1628155 from Alan Woodward in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1628155 ] LUCENE-5911 : Add MemoryIndex.freeze() to allow for thread-safe searching
        Hide
        ASF subversion and git services added a comment -

        Commit 1628194 from Robert Muir in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1628194 ]

        LUCENE-5911: fix compile

        Show
        ASF subversion and git services added a comment - Commit 1628194 from Robert Muir in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1628194 ] LUCENE-5911 : fix compile
        Hide
        David Smiley added a comment -

        I see that getNormValues() caches the norms and doesn't use synchronization, so it's not thread-safe yet.

        Show
        David Smiley added a comment - I see that getNormValues() caches the norms and doesn't use synchronization, so it's not thread-safe yet.
        Hide
        Robert Muir added a comment -

        David: you are right.

        This caching is unnecessary in my opinion.

        Instead if the optional freeze() method is supposed to prepare internal structures for querying, I would have freeze() just prebuild, populate an ArrayList<NumericDocValues> (indexed by field number), by calling getNormValues() for each one. If you don't call freeze, this list is null, and it just returns a new one each time: not worth caching or anything for that case.

        Show
        Robert Muir added a comment - David: you are right. This caching is unnecessary in my opinion. Instead if the optional freeze() method is supposed to prepare internal structures for querying, I would have freeze() just prebuild, populate an ArrayList<NumericDocValues> (indexed by field number), by calling getNormValues() for each one. If you don't call freeze, this list is null, and it just returns a new one each time: not worth caching or anything for that case.
        Hide
        Alan Woodward added a comment -

        Good catch, David.

        Here's a patch implementing Rob's idea (although I've used a map of String->NumericDocValues rather than an Array). It also changes how Similarities are set, because you now need the Similarity to be known before freeze() is called. This is a bit nicer than the current API, I think, whereby if you want to change the Similarity you have to get an IndexSearcher and then call setSimilarity on it.

        Show
        Alan Woodward added a comment - Good catch, David. Here's a patch implementing Rob's idea (although I've used a map of String->NumericDocValues rather than an Array). It also changes how Similarities are set, because you now need the Similarity to be known before freeze() is called. This is a bit nicer than the current API, I think, whereby if you want to change the Similarity you have to get an IndexSearcher and then call setSimilarity on it.
        Hide
        David Smiley added a comment -

        +1 looks good.

        Show
        David Smiley added a comment - +1 looks good.
        Hide
        ASF subversion and git services added a comment -

        Commit 1633321 from Alan Woodward in branch 'dev/trunk'
        [ https://svn.apache.org/r1633321 ]

        LUCENE-5911: Remove cacheing of norms, calculate up front in freeze()

        Show
        ASF subversion and git services added a comment - Commit 1633321 from Alan Woodward in branch 'dev/trunk' [ https://svn.apache.org/r1633321 ] LUCENE-5911 : Remove cacheing of norms, calculate up front in freeze()
        Hide
        ASF subversion and git services added a comment -

        Commit 1633322 from Alan Woodward in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1633322 ]

        LUCENE-5911: Remove cacheing of norms, calculate up front in freeze()

        Show
        ASF subversion and git services added a comment - Commit 1633322 from Alan Woodward in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1633322 ] LUCENE-5911 : Remove cacheing of norms, calculate up front in freeze()
        Hide
        ASF subversion and git services added a comment -

        Commit 1634034 from Alan Woodward in branch 'dev/branches/lucene_solr_4_10'
        [ https://svn.apache.org/r1634034 ]

        LUCENE-5911: Add freeze() method to MemoryIndex to allow thread-safe searching

        Show
        ASF subversion and git services added a comment - Commit 1634034 from Alan Woodward in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1634034 ] LUCENE-5911 : Add freeze() method to MemoryIndex to allow thread-safe searching
        Hide
        ASF subversion and git services added a comment -

        Commit 1634035 from Alan Woodward in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1634035 ]

        LUCENE-5911: Update 5x CHANGES.txt

        Show
        ASF subversion and git services added a comment - Commit 1634035 from Alan Woodward in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1634035 ] LUCENE-5911 : Update 5x CHANGES.txt
        Hide
        ASF subversion and git services added a comment -

        Commit 1634036 from Alan Woodward in branch 'dev/trunk'
        [ https://svn.apache.org/r1634036 ]

        LUCENE-5911: Update trunk CHANGES.txt

        Show
        ASF subversion and git services added a comment - Commit 1634036 from Alan Woodward in branch 'dev/trunk' [ https://svn.apache.org/r1634036 ] LUCENE-5911 : Update trunk CHANGES.txt
        Hide
        Alan Woodward added a comment -

        Reopening for backport to 4.10.2

        Show
        Alan Woodward added a comment - Reopening for backport to 4.10.2
        Hide
        Michael McCandless added a comment -

        I think this is too big a change to push into 4.10.x branch? That branch is for bug fixes only?

        Show
        Michael McCandless added a comment - I think this is too big a change to push into 4.10.x branch? That branch is for bug fixes only?
        Hide
        Alan Woodward added a comment -

        Is it? I thought we weren't doing a 4.11?

        Show
        Alan Woodward added a comment - Is it? I thought we weren't doing a 4.11?
        Hide
        Michael McCandless added a comment -

        I thought we weren't doing a 4.11?

        I don't think we are. The next feature release is going to be 5.0, hopefully soon...

        Show
        Michael McCandless added a comment - I thought we weren't doing a 4.11? I don't think we are. The next feature release is going to be 5.0, hopefully soon...
        Hide
        Alan Woodward added a comment -

        OK, will revert.

        Show
        Alan Woodward added a comment - OK, will revert.
        Hide
        ASF subversion and git services added a comment -

        Commit 1634054 from Alan Woodward in branch 'dev/branches/lucene_solr_4_10'
        [ https://svn.apache.org/r1634054 ]

        LUCENE-5911: Revert backport

        Show
        ASF subversion and git services added a comment - Commit 1634054 from Alan Woodward in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1634054 ] LUCENE-5911 : Revert backport
        Hide
        Michael McCandless added a comment -

        Thanks Alan, I'll spin an RC now...

        Show
        Michael McCandless added a comment - Thanks Alan, I'll spin an RC now...
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            Alan Woodward
            Reporter:
            Alan Woodward
          • Votes:
            3 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development