Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5644

ThreadAffinityDocumentsWriterThreadPool should clear the bindings on flush

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.8.1, 4.9, 6.0
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This class remembers which thread used which DWPT, but it never clears
      this "affinity". It really should clear it on flush, this way if the
      number of threads doing indexing has changed we only use as many DWPTs
      as there are incoming threads.

      1. LUCENE-5644.patch
        7 kB
        Michael McCandless
      2. LUCENE-5644.patch
        25 kB
        Michael McCandless
      3. LUCENE-5644.patch
        25 kB
        Michael McCandless
      4. LUCENE-5644.patch
        23 kB
        Michael McCandless
      5. LUCENE-5644.patch
        5 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on the issue:

          https://github.com/apache/lucenenet/pull/208

          Guys, the PR title here references LUCENE-5644 and this trigger's ASF JIRA-GitHub integration to link the conversation here to comments on that old issue. Can you please edit the PR title?

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on the issue: https://github.com/apache/lucenenet/pull/208 Guys, the PR title here references LUCENE-5644 and this trigger's ASF JIRA-GitHub integration to link the conversation here to comments on that old issue. Can you please edit the PR title?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvdb commented on the issue:

          https://github.com/apache/lucenenet/pull/208

          I’m not planning to submit another pull request at this time.
          Feel free to take whatever you need from my repository: you’re better at this than I am.

          Vincent

          From: Shad Storhaug notifications@github.com
          Sent: Tuesday, June 27, 2017 10:27 AM
          To: apache/lucenenet <lucenenet@noreply.github.com>
          Cc: Van Den Berghe, Vincent <Vincent.VanDenBerghe@bvdinfo.com>; Mention <mention@noreply.github.com>
          Subject: Re: [apache/lucenenet] LUCENE-5644: switch to simpler LIFO thread to ThreadState allocator d… (#208)

          @vvdb<https://github.com/vvdb>

          I would like to add this fix to the next release. Are you planning to submit another pull request?


          You are receiving this because you were mentioned.
          Reply to this email directly, view it on GitHub<https://github.com/apache/lucenenet/pull/208#issuecomment-311289271>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ARBXprqH0s0Tjj1dhQ4f_ryNc6H2vCZEks5sILy9gaJpZM4N4A9a>.

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvdb commented on the issue: https://github.com/apache/lucenenet/pull/208 I’m not planning to submit another pull request at this time. Feel free to take whatever you need from my repository: you’re better at this than I am. Vincent From: Shad Storhaug notifications@github.com Sent: Tuesday, June 27, 2017 10:27 AM To: apache/lucenenet <lucenenet@noreply.github.com> Cc: Van Den Berghe, Vincent <Vincent.VanDenBerghe@bvdinfo.com>; Mention <mention@noreply.github.com> Subject: Re: [apache/lucenenet] LUCENE-5644 : switch to simpler LIFO thread to ThreadState allocator d… (#208) @vvdb< https://github.com/vvdb > I would like to add this fix to the next release. Are you planning to submit another pull request? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/apache/lucenenet/pull/208#issuecomment-311289271 >, or mute the thread< https://github.com/notifications/unsubscribe-auth/ARBXprqH0s0Tjj1dhQ4f_ryNc6H2vCZEks5sILy9gaJpZM4N4A9a >.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user NightOwl888 commented on the issue:

          https://github.com/apache/lucenenet/pull/208

          @vvdb

          I would like to add this fix to the next release. Are you planning to submit another pull request?

          Show
          githubbot ASF GitHub Bot added a comment - Github user NightOwl888 commented on the issue: https://github.com/apache/lucenenet/pull/208 @vvdb I would like to add this fix to the next release. Are you planning to submit another pull request?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvdb closed the pull request at:

          https://github.com/apache/lucenenet/pull/208

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvdb closed the pull request at: https://github.com/apache/lucenenet/pull/208
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user NightOwl888 commented on the issue:

          https://github.com/apache/lucenenet/pull/208

          > Second, assuming we have no threading issues at the moment, how can we be sure this doesn't introduce any? cc @NightOwl888

          @synhershko

          First of all, we still have some threading issues. The [Lucene.Net.Store.TestLockFactory.TestStressLocksNativeFSLockFactory test](https://teamcity.jetbrains.com/viewLog.html?buildId=1084071&tab=buildResultsDiv&buildTypeId=LuceneNet_PortableBuilds_TestOnNet451) and [Lucene.Net.Store.TestLockFactory.TestStressLocks test](https://teamcity.jetbrains.com/viewLog.html?buildId=1071425&tab=buildResultsDiv&buildTypeId=LuceneNet_PortableBuilds_TestOnNet451) still fail randomly, and there may be others (last I checked there were 2 tests that failed only when the MMapDirectory was randomly selected).

          Secondly, this change is part of the Lucene 4.8.1 patch. Much of this port (including most or all of Analysis.Common) is from 4.8.1 already, so we should probably try to include the rest of the fixes to get us all the way to 4.8.1, especially if they have performance benefits.

          @vvdb

          If we are going to work toward 4.8.1 on a gradual basis, we should probably include a comment at the top of each changed file indicating the version compatibility level so we don't have to check the entire file to see if it is up to speed.

          ```
          // Version compatibility level: 4.8.1
          ```

          As for getting all the way there, I have outlined a procedure [here](https://github.com/apache/lucenenet/pull/174#issuecomment-251614795) for upgrading Lucene.Net to 4.8.1, which should be a lot quicker and easier than porting it from scratch again. Of course, it requires you to have a text comparison tool that allows you to filter out "unimportant changes" such as Beyond Compare. This will ensure we get all of the changes between the old file and new file and port them into the .NET file.

          It would be better to have a tool that allows you to ignore simple code formatting changes (such as changing the curly bracket from the same line to the following line of a function or if statement), but to my knowledge a tool like that doesn't exist.

          Show
          githubbot ASF GitHub Bot added a comment - Github user NightOwl888 commented on the issue: https://github.com/apache/lucenenet/pull/208 > Second, assuming we have no threading issues at the moment, how can we be sure this doesn't introduce any? cc @NightOwl888 @synhershko First of all, we still have some threading issues. The [Lucene.Net.Store.TestLockFactory.TestStressLocksNativeFSLockFactory test] ( https://teamcity.jetbrains.com/viewLog.html?buildId=1084071&tab=buildResultsDiv&buildTypeId=LuceneNet_PortableBuilds_TestOnNet451 ) and [Lucene.Net.Store.TestLockFactory.TestStressLocks test] ( https://teamcity.jetbrains.com/viewLog.html?buildId=1071425&tab=buildResultsDiv&buildTypeId=LuceneNet_PortableBuilds_TestOnNet451 ) still fail randomly, and there may be others (last I checked there were 2 tests that failed only when the MMapDirectory was randomly selected). Secondly, this change is part of the Lucene 4.8.1 patch. Much of this port (including most or all of Analysis.Common) is from 4.8.1 already, so we should probably try to include the rest of the fixes to get us all the way to 4.8.1, especially if they have performance benefits. @vvdb If we are going to work toward 4.8.1 on a gradual basis, we should probably include a comment at the top of each changed file indicating the version compatibility level so we don't have to check the entire file to see if it is up to speed. ``` // Version compatibility level: 4.8.1 ``` As for getting all the way there, I have outlined a procedure [here] ( https://github.com/apache/lucenenet/pull/174#issuecomment-251614795 ) for upgrading Lucene.Net to 4.8.1, which should be a lot quicker and easier than porting it from scratch again. Of course, it requires you to have a text comparison tool that allows you to filter out "unimportant changes" such as Beyond Compare. This will ensure we get all of the changes between the old file and new file and port them into the .NET file. It would be better to have a tool that allows you to ignore simple code formatting changes (such as changing the curly bracket from the same line to the following line of a function or if statement), but to my knowledge a tool like that doesn't exist.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvdb commented on the issue:

          https://github.com/apache/lucenenet/pull/208

          >> align with the project's whitespace specs
          I’ve been trying to find the project’s code formatting document, but was unsuccessful.

          >> Second, assuming we have no threading issues at the moment, how can we be sure this doesn't introduce any?
          We can be sure of nothing, since certainty is only as strong as the assumptions it is based on.
          However:

          • the tests have been appropriately changed and use the new code, and they all pass.
          • On a more personal level (even though it means nothing since you have to take my unverifiable word for it), this code has been in production use these past 3 months on a 28-core machine, all of them writing to a single IndexWriter for about a hundred million documents, repeatedly. No problems were ever seen.
            These things don’t show the absence of problems, but make me reasonably confident that the code is sound.

          Vincent

          From: Itamar Syn-Hershko notifications@github.com
          Sent: Tuesday, June 13, 2017 9:09 AM
          To: apache/lucenenet <lucenenet@noreply.github.com>
          Cc: Van Den Berghe, Vincent <Vincent.VanDenBerghe@bvdinfo.com>; Author <author@noreply.github.com>
          Subject: Re: [apache/lucenenet] LUCENE-5644: switch to simpler LIFO thread to ThreadState allocator d… (#208)

          Thanks, that's an interesting one.

          First, there's a lot of whitespace noise - can you remove it please? (align with the project's whitespace specs)

          Second, assuming we have no threading issues at the moment, how can we be sure this doesn't introduce any? cc @NightOwl888<https://github.com/nightowl888>


          You are receiving this because you authored the thread.
          Reply to this email directly, view it on GitHub<https://github.com/apache/lucenenet/pull/208#issuecomment-308027273>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ARBXpku1AqOToF4KGaoBuQG8Pu83axMhks5sDjV4gaJpZM4N4A9a>.

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvdb commented on the issue: https://github.com/apache/lucenenet/pull/208 >> align with the project's whitespace specs I’ve been trying to find the project’s code formatting document, but was unsuccessful. >> Second, assuming we have no threading issues at the moment, how can we be sure this doesn't introduce any? We can be sure of nothing, since certainty is only as strong as the assumptions it is based on. However: the tests have been appropriately changed and use the new code, and they all pass. On a more personal level (even though it means nothing since you have to take my unverifiable word for it), this code has been in production use these past 3 months on a 28-core machine, all of them writing to a single IndexWriter for about a hundred million documents, repeatedly. No problems were ever seen. These things don’t show the absence of problems, but make me reasonably confident that the code is sound. Vincent From: Itamar Syn-Hershko notifications@github.com Sent: Tuesday, June 13, 2017 9:09 AM To: apache/lucenenet <lucenenet@noreply.github.com> Cc: Van Den Berghe, Vincent <Vincent.VanDenBerghe@bvdinfo.com>; Author <author@noreply.github.com> Subject: Re: [apache/lucenenet] LUCENE-5644 : switch to simpler LIFO thread to ThreadState allocator d… (#208) Thanks, that's an interesting one. First, there's a lot of whitespace noise - can you remove it please? (align with the project's whitespace specs) Second, assuming we have no threading issues at the moment, how can we be sure this doesn't introduce any? cc @NightOwl888< https://github.com/nightowl888 > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/apache/lucenenet/pull/208#issuecomment-308027273 >, or mute the thread< https://github.com/notifications/unsubscribe-auth/ARBXpku1AqOToF4KGaoBuQG8Pu83axMhks5sDjV4gaJpZM4N4A9a >.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user synhershko commented on the issue:

          https://github.com/apache/lucenenet/pull/208

          Thanks, that's an interesting one.

          First, there's a lot of whitespace noise - can you remove it please? (align with the project's whitespace specs)

          Second, assuming we have no threading issues at the moment, how can we be sure this doesn't introduce any? cc @NightOwl888

          Show
          githubbot ASF GitHub Bot added a comment - Github user synhershko commented on the issue: https://github.com/apache/lucenenet/pull/208 Thanks, that's an interesting one. First, there's a lot of whitespace noise - can you remove it please? (align with the project's whitespace specs) Second, assuming we have no threading issues at the moment, how can we be sure this doesn't introduce any? cc @NightOwl888
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user synhershko commented on a diff in the pull request:

          https://github.com/apache/lucenenet/pull/208#discussion_r121596530

          — Diff: src/Lucene.Net/Document/Document.cs —
          @@ -278,9 +279,17 @@ public string Get(string name)
          return null;
          }

          • /// <summary>
          • /// Prints the fields of a document for human consumption. </summary>
          • public override string ToString()
            + /// <summary>
            + /// Remove all fields from a document
            + /// </summary>
            + public void Clear()
            + { + fields.Clear(); + }
              • End diff –

          this seems unrelated?

          Show
          githubbot ASF GitHub Bot added a comment - Github user synhershko commented on a diff in the pull request: https://github.com/apache/lucenenet/pull/208#discussion_r121596530 — Diff: src/Lucene.Net/Document/Document.cs — @@ -278,9 +279,17 @@ public string Get(string name) return null; } /// <summary> /// Prints the fields of a document for human consumption. </summary> public override string ToString() + /// <summary> + /// Remove all fields from a document + /// </summary> + public void Clear() + { + fields.Clear(); + } End diff – this seems unrelated?
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user vvdb opened a pull request:

          https://github.com/apache/lucenenet/pull/208

          LUCENE-5644: switch to simpler LIFO thread to ThreadState allocator d…

          …uring indexing”. Technically, this is something from releases/lucene-solr/4.8.1, but profiling indicates it makes a huge difference in multithreaded scenarios

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/vvdb/lucenenet master

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucenenet/pull/208.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #208


          commit 18b8ef71fc08c1583c323ff00e2d1ef07e55c72c
          Author: Vincent Van Den Berghe <vvdb@bvdep.com>
          Date: 2017-06-13T05:50:31Z

          LUCENE-5644: switch to simpler LIFO thread to ThreadState allocator during indexing”. Technically, this is something from releases/lucene-solr/4.8.1, but profiling indicates it makes a huge difference in multithreaded scenarios


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user vvdb opened a pull request: https://github.com/apache/lucenenet/pull/208 LUCENE-5644 : switch to simpler LIFO thread to ThreadState allocator d… …uring indexing”. Technically, this is something from releases/lucene-solr/4.8.1, but profiling indicates it makes a huge difference in multithreaded scenarios You can merge this pull request into a Git repository by running: $ git pull https://github.com/vvdb/lucenenet master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucenenet/pull/208.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #208 commit 18b8ef71fc08c1583c323ff00e2d1ef07e55c72c Author: Vincent Van Den Berghe <vvdb@bvdep.com> Date: 2017-06-13T05:50:31Z LUCENE-5644 : switch to simpler LIFO thread to ThreadState allocator during indexing”. Technically, this is something from releases/lucene-solr/4.8.1, but profiling indicates it makes a huge difference in multithreaded scenarios
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1593651 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1593651 ]

          LUCENE-5644: favor an already initialized ThreadState

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1593651 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1593651 ] LUCENE-5644 : favor an already initialized ThreadState
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1593650 from Michael McCandless in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1593650 ]

          LUCENE-5644: favor an already initialized ThreadState

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1593650 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1593650 ] LUCENE-5644 : favor an already initialized ThreadState
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1593649 from Michael McCandless in branch 'dev/branches/lucene_solr_4_8'
          [ https://svn.apache.org/r1593649 ]

          LUCENE-5644: favor an already initialized ThreadState

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1593649 from Michael McCandless in branch 'dev/branches/lucene_solr_4_8' [ https://svn.apache.org/r1593649 ] LUCENE-5644 : favor an already initialized ThreadState
          Hide
          mikemccand Michael McCandless added a comment -

          Patch w/ test; the core change is quite minor, in the case where the ThreadState isn't initialized, we loop through the other thread states to find one that is initialized.

          Show
          mikemccand Michael McCandless added a comment - Patch w/ test; the core change is quite minor, in the case where the ThreadState isn't initialized, we loop through the other thread states to find one that is initialized.
          Hide
          mikemccand Michael McCandless added a comment -

          Reopening ...

          I realized the simple LIFO logic is too simple, right after a flush. When that happens, we should try to pick a ThreadState that's already initialized, so that if no full-flushing (getReader) is happening, we don't tie up RAM indefinitely in the pending segments.

          Show
          mikemccand Michael McCandless added a comment - Reopening ... I realized the simple LIFO logic is too simple, right after a flush. When that happens, we should try to pick a ThreadState that's already initialized, so that if no full-flushing (getReader) is happening, we don't tie up RAM indefinitely in the pending segments.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1593240 from Michael McCandless in branch 'dev/branches/lucene_solr_4_8'
          [ https://svn.apache.org/r1593240 ]

          LUCENE-5644: remove this class

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1593240 from Michael McCandless in branch 'dev/branches/lucene_solr_4_8' [ https://svn.apache.org/r1593240 ] LUCENE-5644 : remove this class
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1593230 from Michael McCandless in branch 'dev/branches/lucene_solr_4_8'
          [ https://svn.apache.org/r1593230 ]

          LUCENE-5644: switch to simpler LIFO thread to ThreadState allocator during indexing

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1593230 from Michael McCandless in branch 'dev/branches/lucene_solr_4_8' [ https://svn.apache.org/r1593230 ] LUCENE-5644 : switch to simpler LIFO thread to ThreadState allocator during indexing
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1593228 from Michael McCandless in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1593228 ]

          LUCENE-5644: switch to simpler LIFO thread to ThreadState allocator during indexing

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1593228 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1593228 ] LUCENE-5644 : switch to simpler LIFO thread to ThreadState allocator during indexing
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1593226 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1593226 ]

          LUCENE-5644: switch to simpler LIFO thread to ThreadState allocator during indexing

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1593226 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1593226 ] LUCENE-5644 : switch to simpler LIFO thread to ThreadState allocator during indexing
          Hide
          mikemccand Michael McCandless added a comment -

          New patch, folding in all feedback. I think it's ready; I'll commit soon.

          But ant precommit is currently failing for other reasons:

          BUILD FAILED
          /l/fifo/build.xml:92: The following files contain @author tags, tabs or nocommits:
          * solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java
          
          Show
          mikemccand Michael McCandless added a comment - New patch, folding in all feedback. I think it's ready; I'll commit soon. But ant precommit is currently failing for other reasons: BUILD FAILED /l/fifo/build.xml:92: The following files contain @author tags, tabs or nocommits: * solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java
          Hide
          mikemccand Michael McCandless added a comment -

          Not necessarily. You can have different threads batch up different documents and pre-sort them before .addDocument. Then you end up w/ pre-sorted segments (if you have Thread-affinity). I'm not saying it's such an important usecase to support out-of-the-box, more that there is such a usecase, and I think it'd be better if we offer some way in Lucene to enable it.

          OK, I see, you're right. But this (which segment your thread sends
          its docs to) was never really a hard guarantee with IW; it's really an
          impl detail.

          How about I make this default impl un-final?

          Either that or factor out a simple API, your call.

          I'd like to defer this: I'll open a separate issue. I agree it's
          dangerous to have the one impl not be final, so I put final back for
          now, and in the follow on issue we can make it un-final and abstract
          again, and maybe we can get a (correct) thread-affinity impl back. I
          just think tweaking with the thread assignment is very very expert.

          I'd love in general to not sync on "this" but rather have a lock obj or a condition but that's just taste I guess...

          I'd rather not add another lock here (this class is already using its
          intrinsic lock)? Or at least, let's defer it; maybe we can make a
          lock-free queue later...

          Show
          mikemccand Michael McCandless added a comment - Not necessarily. You can have different threads batch up different documents and pre-sort them before .addDocument. Then you end up w/ pre-sorted segments (if you have Thread-affinity). I'm not saying it's such an important usecase to support out-of-the-box, more that there is such a usecase, and I think it'd be better if we offer some way in Lucene to enable it. OK, I see, you're right. But this (which segment your thread sends its docs to) was never really a hard guarantee with IW; it's really an impl detail. How about I make this default impl un-final? Either that or factor out a simple API, your call. I'd like to defer this: I'll open a separate issue. I agree it's dangerous to have the one impl not be final, so I put final back for now, and in the follow on issue we can make it un-final and abstract again, and maybe we can get a (correct) thread-affinity impl back. I just think tweaking with the thread assignment is very very expert. I'd love in general to not sync on "this" but rather have a lock obj or a condition but that's just taste I guess... I'd rather not add another lock here (this class is already using its intrinsic lock)? Or at least, let's defer it; maybe we can make a lock-free queue later...
          Hide
          shaie Shai Erera added a comment -

          For that to work right you need to use only 1 thread right? I don't
          think this patch will break it?

          Not necessarily. You can have different threads batch up different documents and pre-sort them before .addDocument. Then you end up w/ pre-sorted segments (if you have Thread-affinity). I'm not saying it's such an important usecase to support out-of-the-box, more that there is such a usecase, and I think it'd be better if we offer some way in Lucene to enable it.

          How about I make this default impl un-final?

          Either that or factor out a simple API, your call. I haven't looked at the API closely yet, but my feeling is that if we keep the previous abstract class and a final LIFOPool class, that would do it? When the base class is not abstract and final, I worry that at some point we'll worry about back-compat issues etc. A final class makes it easier for us to assume things. But, I leave it to you.

          Show
          shaie Shai Erera added a comment - For that to work right you need to use only 1 thread right? I don't think this patch will break it? Not necessarily. You can have different threads batch up different documents and pre-sort them before .addDocument. Then you end up w/ pre-sorted segments (if you have Thread-affinity). I'm not saying it's such an important usecase to support out-of-the-box, more that there is such a usecase, and I think it'd be better if we offer some way in Lucene to enable it. How about I make this default impl un-final? Either that or factor out a simple API, your call. I haven't looked at the API closely yet, but my feeling is that if we keep the previous abstract class and a final LIFOPool class, that would do it? When the base class is not abstract and final, I worry that at some point we'll worry about back-compat issues etc. A final class makes it easier for us to assume things. But, I leave it to you.
          Hide
          mikemccand Michael McCandless added a comment -

          Thanks Simon, I'm folding these in ...

          in the look instead of break can we lock and return the state it seems like it's more clear that way since we only lock if we get it form the free list?

          The challenge is I want to .lock() outside of the sync block ...

          Show
          mikemccand Michael McCandless added a comment - Thanks Simon, I'm folding these in ... in the look instead of break can we lock and return the state it seems like it's more clear that way since we only lock if we get it form the free list? The challenge is I want to .lock() outside of the sync block ...
          Hide
          simonw Simon Willnauer added a comment -

          I like the patch much simpler though... a couple of comments

          • can we remove the sync keyword from newThreadState now? it seems unnecessary?
          • can we call notifyall across the board seems safer?
          • in the look instead of break can we lock and return the state it seems like it's more clear that way since we only lock if we get it form the free list?
          • I'd love in general to not sync on "this" but rather have a lock obj or a condition but that's just taste I guess...
          • in release can we unlock outside the sync block?

          in general that looks good though. I wonder if we really should make it final but that's a different story...

          Show
          simonw Simon Willnauer added a comment - I like the patch much simpler though... a couple of comments can we remove the sync keyword from newThreadState now? it seems unnecessary? can we call notifyall across the board seems safer? in the look instead of break can we lock and return the state it seems like it's more clear that way since we only lock if we get it form the free list? I'd love in general to not sync on "this" but rather have a lock obj or a condition but that's just taste I guess... in release can we unlock outside the sync block? in general that looks good though. I wonder if we really should make it final but that's a different story...
          Hide
          mikemccand Michael McCandless added a comment -

          I beasted Lucene's core+module tests 48 times and didn't hit any fails except https://issues.apache.org/jira/browse/LUCENE-5652 (unrelated)...

          Show
          mikemccand Michael McCandless added a comment - I beasted Lucene's core+module tests 48 times and didn't hit any fails except https://issues.apache.org/jira/browse/LUCENE-5652 (unrelated)...
          Hide
          mikemccand Michael McCandless added a comment -

          New patch, folding in Shai's feedback, removing the nocommit, and adding a notifyAll() to wake up any indexing threads (but my test case failed to uncover deadlock here...).

          I'm running distributed beasting of all Lucene's core + modules tests ....

          Show
          mikemccand Michael McCandless added a comment - New patch, folding in Shai's feedback, removing the nocommit, and adding a notifyAll() to wake up any indexing threads (but my test case failed to uncover deadlock here...). I'm running distributed beasting of all Lucene's core + modules tests ....
          Hide
          mikemccand Michael McCandless added a comment -

          Another usecase where I thought about thread-affinity is if you want to e.g. create an index in a special manner, for instance a pre-sorted index. So say that you aggregate a bunch of documents, then pre-sort them and add them one at a time, with the intention to have a pre-sorted segment. With this patch it's impossible right?

          For that to work right you need to use only 1 thread right? I don't
          think this patch will break it?

          But I also think this should be pluggable

          How about I make this default impl un-final?

          Show
          mikemccand Michael McCandless added a comment - Another usecase where I thought about thread-affinity is if you want to e.g. create an index in a special manner, for instance a pre-sorted index. So say that you aggregate a bunch of documents, then pre-sort them and add them one at a time, with the intention to have a pre-sorted segment. With this patch it's impossible right? For that to work right you need to use only 1 thread right? I don't think this patch will break it? But I also think this should be pluggable How about I make this default impl un-final?
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1593039 from Michael McCandless in branch 'dev/branches/lucene_solr_4_8'
          [ https://svn.apache.org/r1593039 ]

          LUCENE-5644: add test case

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1593039 from Michael McCandless in branch 'dev/branches/lucene_solr_4_8' [ https://svn.apache.org/r1593039 ] LUCENE-5644 : add test case
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1593038 from Michael McCandless in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1593038 ]

          LUCENE-5644: add test case

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1593038 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1593038 ] LUCENE-5644 : add test case
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1593037 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1593037 ]

          LUCENE-5644: add test case

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1593037 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1593037 ] LUCENE-5644 : add test case
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1593029 from Michael McCandless in branch 'dev/branches/lucene_solr_4_8'
          [ https://svn.apache.org/r1593029 ]

          LUCENE-5644: i moved this test over to new class

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1593029 from Michael McCandless in branch 'dev/branches/lucene_solr_4_8' [ https://svn.apache.org/r1593029 ] LUCENE-5644 : i moved this test over to new class
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1593028 from Michael McCandless in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1593028 ]

          LUCENE-5644: i moved this test over to new class

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1593028 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1593028 ] LUCENE-5644 : i moved this test over to new class
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1593027 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1593027 ]

          LUCENE-5644: i moved this test over to new class

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1593027 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1593027 ] LUCENE-5644 : i moved this test over to new class
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1593026 from Michael McCandless in branch 'dev/branches/lucene_solr_4_8'
          [ https://svn.apache.org/r1593026 ]

          LUCENE-5644: add test case

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1593026 from Michael McCandless in branch 'dev/branches/lucene_solr_4_8' [ https://svn.apache.org/r1593026 ] LUCENE-5644 : add test case
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1593024 from Michael McCandless in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1593024 ]

          LUCENE-5644: add test case

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1593024 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1593024 ] LUCENE-5644 : add test case
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1593023 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1593023 ]

          LUCENE-5644: add test case

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1593023 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1593023 ] LUCENE-5644 : add test case
          Hide
          shaie Shai Erera added a comment - - edited

          But, 1) I think it's uncommon for apps to do this (they typically use
          a "blind" thread pool) so it's fragile

          I don't know how uncommon it is ... I'm thinking crawlers, where each thread crawls a separate data source. Of course there are crawlers that operate on a pool of resources too. So basically if an app has a thread-affinity logic, with this patch it cannot take advantage of it at all.

          Another usecase where I thought about thread-affinity is if you want to e.g. create an index in a special manner, for instance a pre-sorted index. So say that you aggregate a bunch of documents, then pre-sort them and add them one at a time, with the intention to have a pre-sorted segment. With this patch it's impossible right?

          I agree that we need something simple in Lucene, and if thread-affinity is hard to write/understand/maintain, I'm all for removing it in favor of this simpler model. But I also think this should be pluggable, even if it's uber-expert, package-private plugin. So maybe we can factor out a simple interface/abstract class DWThreadPool and have a concrete final LIFOThreadPool implementation? This will allow apps to specialize if they want?

          I reviewed the patch and I think it's good. But I want to scrutinize it more.

          Show
          shaie Shai Erera added a comment - - edited But, 1) I think it's uncommon for apps to do this (they typically use a "blind" thread pool) so it's fragile I don't know how uncommon it is ... I'm thinking crawlers, where each thread crawls a separate data source. Of course there are crawlers that operate on a pool of resources too. So basically if an app has a thread-affinity logic, with this patch it cannot take advantage of it at all. Another usecase where I thought about thread-affinity is if you want to e.g. create an index in a special manner, for instance a pre-sorted index. So say that you aggregate a bunch of documents, then pre-sort them and add them one at a time, with the intention to have a pre-sorted segment. With this patch it's impossible right? I agree that we need something simple in Lucene, and if thread-affinity is hard to write/understand/maintain, I'm all for removing it in favor of this simpler model. But I also think this should be pluggable, even if it's uber-expert, package-private plugin. So maybe we can factor out a simple interface/abstract class DWThreadPool and have a concrete final LIFOThreadPool implementation? This will allow apps to specialize if they want? I reviewed the patch and I think it's good. But I want to scrutinize it more.
          Hide
          mikemccand Michael McCandless added a comment -

          I suspect on abort I need to wake up any indexing threads waiting on the pool...

          Show
          mikemccand Michael McCandless added a comment - I suspect on abort I need to wake up any indexing threads waiting on the pool...
          Hide
          mikemccand Michael McCandless added a comment -

          After struggling for some time trying to fix the thread hazards
          between flush & getAndLock, I decided to simply remove
          ThreadAffinityDocumentsWriterThreadPool.

          I think it's too hairy/complex. The original motivation for this was
          for when apps use a different thread to index different content
          sources; this way we send similar docs to the same thread state and
          should get better RAM efficiency.

          But, 1) I think it's uncommon for apps to do this (they typically use
          a "blind" thread pool) so it's fragile, 2) it's not clear really how
          much perf gain this is (likely minor), and 3) it's too hard to get this working
          correctly (this issue).

          So instead I made DocumentsWriterPerThreadPool final, and I impl'd a
          simple LIFO queue, and fixed the two places (in DW) that obtain a
          thread state for indexing to use a new DWPTP.release API that puts the
          state back into the free list & notifies for any waiting threads to
          wake up.

          Note that other places that lock a thread state (e.g. flush/abort)
          don't use this free list; I think this is OK?

          Lucene tests pass, but this is a big change, and we need to scrutinize
          / test it more.

          It seems to behave properly when I index Geonames w/ N threads, i.e. I
          get N segments flushed by getReader, but I'll test more ... I'll also
          make a standalone test case that validates this.

          This is just an initial patch quickly coded up ... javadocs are wrong, etc.

          Show
          mikemccand Michael McCandless added a comment - After struggling for some time trying to fix the thread hazards between flush & getAndLock, I decided to simply remove ThreadAffinityDocumentsWriterThreadPool. I think it's too hairy/complex. The original motivation for this was for when apps use a different thread to index different content sources; this way we send similar docs to the same thread state and should get better RAM efficiency. But, 1) I think it's uncommon for apps to do this (they typically use a "blind" thread pool) so it's fragile, 2) it's not clear really how much perf gain this is (likely minor), and 3) it's too hard to get this working correctly (this issue). So instead I made DocumentsWriterPerThreadPool final, and I impl'd a simple LIFO queue, and fixed the two places (in DW) that obtain a thread state for indexing to use a new DWPTP.release API that puts the state back into the free list & notifies for any waiting threads to wake up. Note that other places that lock a thread state (e.g. flush/abort) don't use this free list; I think this is OK? Lucene tests pass, but this is a big change, and we need to scrutinize / test it more. It seems to behave properly when I index Geonames w/ N threads, i.e. I get N segments flushed by getReader, but I'll test more ... I'll also make a standalone test case that validates this. This is just an initial patch quickly coded up ... javadocs are wrong, etc.
          Hide
          mikemccand Michael McCandless added a comment -

          I'm reopening this, something is wrong with the fix ... when I index Geonames with 2 threads I sometimes see strange sized segments on flush, e.g.:

          DWPT 0 [Tue May 06 10:45:11 EDT 2014; main]: flush postings as segment _26 numDocs=1
          DWPT 0 [Tue May 06 10:45:11 EDT 2014; Thread-0]: flush postings as segment _25 numDocs=15872
          
          Show
          mikemccand Michael McCandless added a comment - I'm reopening this, something is wrong with the fix ... when I index Geonames with 2 threads I sometimes see strange sized segments on flush, e.g.: DWPT 0 [Tue May 06 10:45:11 EDT 2014; main]: flush postings as segment _26 numDocs=1 DWPT 0 [Tue May 06 10:45:11 EDT 2014; Thread-0]: flush postings as segment _25 numDocs=15872
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1592623 from Michael McCandless in branch 'dev/branches/lucene_solr_4_8'
          [ https://svn.apache.org/r1592623 ]

          LUCENE-5644: clear IW's thread state bindings on flush

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1592623 from Michael McCandless in branch 'dev/branches/lucene_solr_4_8' [ https://svn.apache.org/r1592623 ] LUCENE-5644 : clear IW's thread state bindings on flush
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1592621 from Michael McCandless in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1592621 ]

          LUCENE-5644: clear IW's thread state bindings on flush

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1592621 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1592621 ] LUCENE-5644 : clear IW's thread state bindings on flush
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1592620 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1592620 ]

          LUCENE-5644: clear IW's thread state bindings on flush

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1592620 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1592620 ] LUCENE-5644 : clear IW's thread state bindings on flush
          Hide
          mikemccand Michael McCandless added a comment -

          Ahh, good catch Simon; I'll add that & commit shortly.

          Show
          mikemccand Michael McCandless added a comment - Ahh, good catch Simon; I'll add that & commit shortly.
          Hide
          simonw Simon Willnauer added a comment -

          I think in general this looks good to me. There is a small chance that the thread affinity will not work corretly if we reset while some other thread is already calling "minThreadState.lock()" we need to put the thread / state binding back into the map if the locked state is not initialized like:

          minThreadState.lock();
          if (minThreadState.isInitialized() == false) {
            threadBindings.put(requestingThread, minThreadState); // make sure we get the same state next time 
          }
          return minThreadState;
          
          Show
          simonw Simon Willnauer added a comment - I think in general this looks good to me. There is a small chance that the thread affinity will not work corretly if we reset while some other thread is already calling "minThreadState.lock()" we need to put the thread / state binding back into the map if the locked state is not initialized like: minThreadState.lock(); if (minThreadState.isInitialized() == false ) { threadBindings.put(requestingThread, minThreadState); // make sure we get the same state next time } return minThreadState;
          Hide
          mikemccand Michael McCandless added a comment -

          Patch w/ test & fix. I also noticed we were not recording a thread
          binding when we had an uncontended minThreadState, causing an
          unnecessary call to minContentedThreadState for each doc.

          Show
          mikemccand Michael McCandless added a comment - Patch w/ test & fix. I also noticed we were not recording a thread binding when we had an uncontended minThreadState, causing an unnecessary call to minContentedThreadState for each doc.

            People

            • Assignee:
              mikemccand Michael McCandless
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development