Solr
  1. Solr
  2. SOLR-6204

FreeBSD does not break out of ServerSocketChannel.accept()

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      This may be the reason why tests behave as crazy as they do on FreeBSD (lucene jenkins). Here's the story.

      I looked at Solr logs and saw this:

        2> 1012153 T10 oejut.QueuedThreadPool.doStop WARN 4 threads could not be stopped
      

      just before failures related to "socket/ port already bound" in SSLMigrationTest. QueuedThreadPool in jetty attempts to wait for pool threads, then terminates them (and waits again). This wait time is configurable, alas broken in Solr's code in JettySolrRunner:

        private void init(String solrHome, String context, int port, boolean stopAtShutdown) {
      ...
            if (threadPool != null) {
              threadPool.setMaxThreads(10000);
              threadPool.setMaxIdleTimeMs(5000);
              threadPool.setMaxStopTimeMs(30000);
            }
      

      The threadPool variable here is always null because it gets assigned after jetty starts and the configuration block is executed before it. the threadPool != null condition is never true and the code that configures those timeouts is dead.

      That's not a biggie, I fixed it. The problem remains, however – even with a long wait time, the threads in accept() call are not interrupted. I wrote a small test class:

      import java.net.InetSocketAddress;
      import java.nio.channels.ServerSocketChannel;
      
      public class Foo {
        public static void main(String[] args) throws Exception {
          final ServerSocketChannel ssc = ServerSocketChannel.open();
          ssc.configureBlocking(true);
          ssc.socket().setReuseAddress(true);
          ssc.socket().bind(new InetSocketAddress(0), 20);
          System.out.println("Port: " + ssc.socket().getLocalPort());
      
          Thread t = new Thread() {
            @Override
            public void run() {
              try {
                System.out.println("Thread accept();");
                ssc.accept().close();
                System.out.println("Done?");
              } catch (Exception e) {
                System.out.println("Thread ex: " + e);
              }
            }
          };
          t.start();
      
          Thread.sleep(2000);
          t.interrupt();
          Thread.sleep(1000);
          System.out.println(t.getState());
        }
      }
      

      If you run it on Windows, for example, here's the expected result:

      Port: 666
      Thread accept();
      Thread ex: java.nio.channels.ClosedByInterruptException
      TERMINATED
      

      Makes sense. On FreeBSD though, the result is:

      Port: 32596
      Thread accept();
      RUNNABLE
      

      Interestingly, the thread IS terminated after ctrl-c is pressed...

      I think this is a showstopper since it violates the contract of accept(), which states:

      ClosedByInterruptException - If another thread interrupts the current thread while the accept operation is in progress, thereby closing the channel and setting the current thread's interrupt status
      

        Issue Links

          Activity

          Hide
          Dawid Weiss added a comment -

          It's kind of funny. Even if you close everything:

          ssc.socket().close();
          ssc.close();
          

          jps still shows that zombie thread inside accept. It's clearly Chuck Norris type of method.

          "Thread-0" prio=5 tid=0x0000000801174000 nid=0x8010a8800 runnable [0x00007ffffe5e7000]
             java.lang.Thread.State: RUNNABLE
                  at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
                  at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241)
                  - locked <0x000000087515fbd8> (a java.lang.Object)
                  at Foo$1.run(Foo.java:34)
          
          Show
          Dawid Weiss added a comment - It's kind of funny. Even if you close everything: ssc.socket().close(); ssc.close(); jps still shows that zombie thread inside accept. It's clearly Chuck Norris type of method. " Thread -0" prio=5 tid=0x0000000801174000 nid=0x8010a8800 runnable [0x00007ffffe5e7000] java.lang. Thread .State: RUNNABLE at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241) - locked <0x000000087515fbd8> (a java.lang. Object ) at Foo$1.run(Foo.java:34)
          Hide
          Dawid Weiss added a comment -

          Seems like this isn't even implemented in the FreeBSD port. Accept tries to get the blocked thread's ID and then signal it, as in:

          class NativeThread {
          
              // Returns an opaque token representing the native thread underlying the
              // invoking Java thread.  On systems that do not require signalling, this
              // method always returns -1.
              //
              static native long current();
          
              // Signals the given native thread so as to release it from a blocking I/O
              // operation.  On systems that do not require signalling, this method has
              // no effect.
              //
              static native void signal(long nt);
          

          but from what I see in the port, it's compiled to the default noop:

          JNIEXPORT jlong JNICALL
          Java_sun_nio_ch_NativeThread_current(JNIEnv *env, jclass cl)
          {
          #ifdef __linux__
              return (long)pthread_self();
          #else
              return -1;
          #endif
          }
          
          JNIEXPORT void JNICALL
          Java_sun_nio_ch_NativeThread_signal(JNIEnv *env, jclass cl, jlong thread)
          {
          #ifdef __linux__
              if (pthread_kill((pthread_t)thread, INTERRUPT_SIGNAL))
                  JNU_ThrowIOExceptionWithLastError(env, "Thread signal failed");
          #endif
          }
          
          Show
          Dawid Weiss added a comment - Seems like this isn't even implemented in the FreeBSD port. Accept tries to get the blocked thread's ID and then signal it, as in: class NativeThread { // Returns an opaque token representing the native thread underlying the // invoking Java thread. On systems that do not require signalling, this // method always returns -1. // static native long current(); // Signals the given native thread so as to release it from a blocking I/O // operation. On systems that do not require signalling, this method has // no effect. // static native void signal( long nt); but from what I see in the port, it's compiled to the default noop: JNIEXPORT jlong JNICALL Java_sun_nio_ch_NativeThread_current(JNIEnv *env, jclass cl) { #ifdef __linux__ return ( long )pthread_self(); # else return -1; #endif } JNIEXPORT void JNICALL Java_sun_nio_ch_NativeThread_signal(JNIEnv *env, jclass cl, jlong thread) { #ifdef __linux__ if (pthread_kill((pthread_t)thread, INTERRUPT_SIGNAL)) JNU_ThrowIOExceptionWithLastError(env, " Thread signal failed" ); #endif }
          Hide
          Dawid Weiss added a comment -

          I was asked by FreeBSD developer Jung-uk Kim to try this patch:
          https://svn.redports.org/jkim/java/openjdk7/files/patch-src-solaris-native-sun-nio-ch-NativeThread.c

          but I have no experience with compiling/ deploying FreeBSD ports. Uwe Schindler can we do it on a live instance vm without breaking too much (is there a compile/ stage/ deploy separation)?

          Show
          Dawid Weiss added a comment - I was asked by FreeBSD developer Jung-uk Kim to try this patch: https://svn.redports.org/jkim/java/openjdk7/files/patch-src-solaris-native-sun-nio-ch-NativeThread.c but I have no experience with compiling/ deploying FreeBSD ports. Uwe Schindler can we do it on a live instance vm without breaking too much (is there a compile/ stage/ deploy separation)?
          Hide
          Uwe Schindler added a comment -

          Hi Dawid,

          I can patch the ports directory and run "make install" afterwards. So this is no issue at all. It will replace the default java7 on the machine. If it does not work, I can revert (will do a tar cvf on the port's build dir before and run make install on the old version). For quick test: Is there a way to reproduce with ANT command line?

          If this patch proves to fix the issue, it should be included in the port by the maintainer, so we get it automatically after upgrading.

          You have to wait until tomorrow, because I am on business trip in Switzerland...

          Show
          Uwe Schindler added a comment - Hi Dawid, I can patch the ports directory and run "make install" afterwards. So this is no issue at all. It will replace the default java7 on the machine. If it does not work, I can revert (will do a tar cvf on the port's build dir before and run make install on the old version). For quick test: Is there a way to reproduce with ANT command line? If this patch proves to fix the issue, it should be included in the port by the maintainer, so we get it automatically after upgrading. You have to wait until tomorrow, because I am on business trip in Switzerland...
          Hide
          Uwe Schindler added a comment -

          The question on the patch: It patches the "solaris" part... Why?

          Show
          Uwe Schindler added a comment - The question on the patch: It patches the "solaris" part... Why?
          Hide
          Dawid Weiss added a comment -

          No problem, Uwe. Thanks.

          > The question on the patch: It patches the "solaris" part... Why?

          Because this particular file is part of BSD's compilation tree – it takes sources from openjdk's Solaris variant. I didn't dig deeply on this. I would also try to confirm what's happening to the "dying" threads on jenkins – the socket hang is one issue, the killed threads are another – so you don't have to update immediately. I'm waiting for another job to hang to then diagnose debug logs.

          Show
          Dawid Weiss added a comment - No problem, Uwe. Thanks. > The question on the patch: It patches the "solaris" part... Why? Because this particular file is part of BSD's compilation tree – it takes sources from openjdk's Solaris variant. I didn't dig deeply on this. I would also try to confirm what's happening to the "dying" threads on jenkins – the socket hang is one issue, the killed threads are another – so you don't have to update immediately. I'm waiting for another job to hang to then diagnose debug logs.
          Hide
          Dawid Weiss added a comment -

          Fixed in FreeBSD's ports. Uwe installed it on jenkins.

          Show
          Dawid Weiss added a comment - Fixed in FreeBSD's ports. Uwe installed it on jenkins.
          Hide
          Uwe Schindler added a comment -

          Yeah, seems to work! It is already of FreeBSD's ports infrastructure. If you are affected by this problem: portsnap fetch; portsnap update; make deinstall; make install

          Show
          Uwe Schindler added a comment - Yeah, seems to work! It is already of FreeBSD's ports infrastructure. If you are affected by this problem: portsnap fetch; portsnap update; make deinstall; make install

            People

            • Assignee:
              Dawid Weiss
              Reporter:
              Dawid Weiss
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development