Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0, 6.6
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      OfflineSorter is a heavy operation and is really an embarrassingly concurrent problem at heart, and if you have enough hardware concurrency (e.g. fast SSDs, multiple CPU cores) it can be a big speedup.

      E.g., after reading a partition from the input, one thread can sort and write it, while another thread reads the next partition, etc. Merging partitions can also be done in the background. Some things still cannot be concurrent, e.g. the initial read from the input must be a single thread, as well as the final merge and writing to the final output.

      I think I found a fairly non-invasive way to add optional concurrency to this class, by adding an optional ExecutorService to OfflineSorter's ctor (similar to IndexSearcher) and using futures to represent each partition as we sort, and creating Callable classes for sorting and merging partitions.

      1. LUCENE-7792.patch
        33 kB
        Michael McCandless
      2. LUCENE-7792.patch
        33 kB
        Michael McCandless
      3. LUCENE-7792.patch
        31 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          mikemccand Michael McCandless added a comment -

          Patch, I think it's ready.

          Show
          mikemccand Michael McCandless added a comment - Patch, I think it's ready.
          Hide
          dweiss Dawid Weiss added a comment -

          Looks good looking at the patch alone. It could be probably made more elegant if null executor was substituted with a "same thread" executor – then no if/else checks would be necessary, you'd simply pass the callable to an executor always (and it'd execute the job immediately on submission).

          Some day we could think of changing reThrow to return a dummy RuntimeException so that:

          +          IOUtils.reThrow(ee.getCause());
          +
          +          // dead code but javac disagrees:
          +          result = null;
          

          could be changed to this:

          +          throw IOUtils.reThrow(ee.getCause());
          

          reThrow would never return any value anyway, but it'd shut up the compiler and make cleaner code (I think).

          Show
          dweiss Dawid Weiss added a comment - Looks good looking at the patch alone. It could be probably made more elegant if null executor was substituted with a "same thread" executor – then no if/else checks would be necessary, you'd simply pass the callable to an executor always (and it'd execute the job immediately on submission). Some day we could think of changing reThrow to return a dummy RuntimeException so that: + IOUtils.reThrow(ee.getCause()); + + // dead code but javac disagrees: + result = null ; could be changed to this: + throw IOUtils.reThrow(ee.getCause()); reThrow would never return any value anyway, but it'd shut up the compiler and make cleaner code (I think).
          Hide
          dsmiley David Smiley added a comment -

          It could be probably made more elegant if null executor was substituted with a "same thread" executor – then no if/else checks would be necessary, you'd simply pass the callable to an executor always (and it'd execute the job immediately on submission).

          I was about to mention the same thing Search for "Executor" in Solr's SimpleFacets.java which uses this technique.

          Show
          dsmiley David Smiley added a comment - It could be probably made more elegant if null executor was substituted with a "same thread" executor – then no if/else checks would be necessary, you'd simply pass the callable to an executor always (and it'd execute the job immediately on submission). I was about to mention the same thing Search for "Executor" in Solr's SimpleFacets.java which uses this technique.
          Hide
          mikemccand Michael McCandless added a comment -

          Some day we could think of changing reThrow to return a dummy RuntimeException so that:

          +1! I get tired of typing that javac dig

          It could be probably made more elegant if null executor was substituted with a "same thread" executor

          Search for "Executor" in Solr's SimpleFacets.java which uses this technique.

          I agree this would be cleaner, but I really need a "same thread" (direct) ExecutorService here, I think? Solr is using Executor there.

          I found this: http://stackoverflow.com/questions/6581188/is-there-an-executorservice-that-uses-the-current-thread/6583868 but that looked quite a bit more complex than the if/then.

          Show
          mikemccand Michael McCandless added a comment - Some day we could think of changing reThrow to return a dummy RuntimeException so that: +1! I get tired of typing that javac dig It could be probably made more elegant if null executor was substituted with a "same thread" executor Search for "Executor" in Solr's SimpleFacets.java which uses this technique. I agree this would be cleaner, but I really need a "same thread" (direct) ExecutorService here, I think? Solr is using Executor there. I found this: http://stackoverflow.com/questions/6581188/is-there-an-executorservice-that-uses-the-current-thread/6583868 but that looked quite a bit more complex than the if/then.
          Hide
          dweiss Dawid Weiss added a comment -

          I'll file a separate issue for reThrow.

          As for executor service – we can do this as a separate issue. There are multiple ways to implement this:

          • immediate (executorservice computes the result on the same thread in submit/execute; returns value-ready future),
          • lazy (executorservice returns a future that lazily computes the result on the get-calling thread).

          There is some bookkeeping if you want to be super exact (associated with termination status), but otherwise option 1 is fairly simple:

              ExecutorService service = new AbstractExecutorService() {
                private volatile boolean shutdown;
          
                @Override
                public void execute(Runnable command) {
                  checkShutdown();
                  command.run();
                }
          
                @Override
                public List<Runnable> shutdownNow() {
                  shutdown();
                  return Collections.emptyList();
                }
          
                @Override
                public void shutdown() {
                  this.shutdown = true;
                }
                
                @Override
                public boolean isTerminated() {
                  // Simplified: we don't check for any threads hanging in execute (we could
                  // introduce an atomic counter, but there seems to be no point).
                  return shutdown == true;
                }
                
                @Override
                public boolean isShutdown() {
                  return shutdown == true;
                }
          
                @Override
                public boolean awaitTermination(long timeout, TimeUnit unit) throws InterruptedException {
                  // See comment in isTerminated();
                  return true;
                }
          
                private void checkShutdown() {
                  if (shutdown) {
                    throw new RejectedExecutionException("Executor is shut down.");
                  }
                }      
              };
          

          and is pretty much what you had in if/else.

          Show
          dweiss Dawid Weiss added a comment - I'll file a separate issue for reThrow. As for executor service – we can do this as a separate issue. There are multiple ways to implement this: immediate (executorservice computes the result on the same thread in submit/execute; returns value-ready future), lazy (executorservice returns a future that lazily computes the result on the get-calling thread). There is some bookkeeping if you want to be super exact (associated with termination status), but otherwise option 1 is fairly simple: ExecutorService service = new AbstractExecutorService() { private volatile boolean shutdown; @Override public void execute( Runnable command) { checkShutdown(); command.run(); } @Override public List< Runnable > shutdownNow() { shutdown(); return Collections.emptyList(); } @Override public void shutdown() { this .shutdown = true ; } @Override public boolean isTerminated() { // Simplified: we don't check for any threads hanging in execute (we could // introduce an atomic counter, but there seems to be no point). return shutdown == true ; } @Override public boolean isShutdown() { return shutdown == true ; } @Override public boolean awaitTermination( long timeout, TimeUnit unit) throws InterruptedException { // See comment in isTerminated(); return true ; } private void checkShutdown() { if (shutdown) { throw new RejectedExecutionException( "Executor is shut down." ); } } }; and is pretty much what you had in if/else.
          Hide
          mikemccand Michael McCandless added a comment -

          Thanks Dawid Weiss; I'll try your impl. above.

          Show
          mikemccand Michael McCandless added a comment - Thanks Dawid Weiss ; I'll try your impl. above.
          Hide
          mikemccand Michael McCandless added a comment -

          New patch; that ExecutorService impl works great! Thanks Dawid Weiss.

          Show
          mikemccand Michael McCandless added a comment - New patch; that ExecutorService impl works great! Thanks Dawid Weiss .
          Hide
          dweiss Dawid Weiss added a comment -

          Looks good to me. I'd be picky with naming and change two things:

           +  private ExecutorService randomExecutorService() {
          

          to randomExecutorServiceOrNull – hey, you're the one appending Ms or Nanos suffix for clarity!

          The second thing is to me SameThreadExecutorService is more understandable than DirectExecutorService, but maybe it's because I'm not a native speaker and need things to be more in your face.

          Up to you on both.

          Show
          dweiss Dawid Weiss added a comment - Looks good to me. I'd be picky with naming and change two things: + private ExecutorService randomExecutorService() { to randomExecutorServiceOrNull – hey, you're the one appending Ms or Nanos suffix for clarity! The second thing is to me SameThreadExecutorService is more understandable than DirectExecutorService , but maybe it's because I'm not a native speaker and need things to be more in your face. Up to you on both.
          Hide
          mikemccand Michael McCandless added a comment -

          OK I folded in those two suggestions Dawid Weiss; I think it's ready.

          Show
          mikemccand Michael McCandless added a comment - OK I folded in those two suggestions Dawid Weiss ; I think it's ready.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit db92a9efc2eeea9a8002a8b8b05828f2411d2a7b in lucene-solr's branch refs/heads/master from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=db92a9e ]

          LUCENE-7792: add optional concurrency to OfflineSorter

          Show
          jira-bot ASF subversion and git services added a comment - Commit db92a9efc2eeea9a8002a8b8b05828f2411d2a7b in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=db92a9e ] LUCENE-7792 : add optional concurrency to OfflineSorter
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit bb8dd7a9f98dd172a66fe6686402500164d69162 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bb8dd7a ]

          LUCENE-7792: add optional concurrency to OfflineSorter

          Show
          jira-bot ASF subversion and git services added a comment - Commit bb8dd7a9f98dd172a66fe6686402500164d69162 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bb8dd7a ] LUCENE-7792 : add optional concurrency to OfflineSorter
          Hide
          mikemccand Michael McCandless added a comment -

          Thanks Dawid Weiss.

          Show
          mikemccand Michael McCandless added a comment - Thanks Dawid Weiss .
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 25f1dd2f9b2a984cc59202fe56f7a8860f514657 in lucene-solr's branch refs/heads/master from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=25f1dd2 ]

          LUCENE-7792: add try/finally to make sure semaphore is released on exceptions

          Show
          jira-bot ASF subversion and git services added a comment - Commit 25f1dd2f9b2a984cc59202fe56f7a8860f514657 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=25f1dd2 ] LUCENE-7792 : add try/finally to make sure semaphore is released on exceptions
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit e83829478e891eecd88a15067e29995f1b706cff in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e838294 ]

          LUCENE-7792: add try/finally to make sure semaphore is released on exceptions

          Show
          jira-bot ASF subversion and git services added a comment - Commit e83829478e891eecd88a15067e29995f1b706cff in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e838294 ] LUCENE-7792 : add try/finally to make sure semaphore is released on exceptions

            People

            • Assignee:
              mikemccand Michael McCandless
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development