Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-2917

Drillbit process fails to restart with address-already-in-use error due to unclean shutdown

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 0.9.0, 1.1.0
    • Future
    • None
    • None

    Description

      On a 4 node cluster, some Drillbits fails to come up, complaining about address already in use.

      Previous drill-bit process (if any) was not listed as running via `jps`. The Web UI continued to list all processes to be up.

      # jps
      <No Drillbit Process>
      
      # /opt/mapr/drill/drill-0.9.0/bin/drillbit.sh stop
      no drillbit to stop because no pid file /opt/mapr/drill/drill-0.9.0/drillbit.pid
      
      # /opt/mapr/drill/drill-0.9.0/bin/drillbit.sh restart
      no drillbit to stop because no pid file /opt/mapr/drill/drill-0.9.0/drillbit.pid
      starting drillbit, logging to /opt/mapr/drill/drill-0.9.0/logs/drillbit.out
      
      # jps
      <No Drillbit Process>
      
      # /opt/mapr/drill/drill-0.9.0/bin/drillbit.sh restart
      no drillbit to stop because kill -0 of pid 22290 failed with status 1
      

      Drillbit.out:

      Exception in thread "main" org.apache.drill.exec.exception.DrillbitStartupException: Failure during initial startup of Drillbit.
              at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:87)
              at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:66)
              at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:166)
      Caused by: org.apache.drill.exec.exception.DrillbitStartupException: Could not bind Drillbit
              at org.apache.drill.exec.rpc.BasicServer.bind(BasicServer.java:158)
              at org.apache.drill.exec.service.ServiceEngine.start(ServiceEngine.java:65)
              at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:241)
              at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:84)
              ... 2 more
      Caused by: java.net.BindException: Address already in use
              at sun.nio.ch.Net.bind0(Native Method)
              at sun.nio.ch.Net.bind(Net.java:444)
              at sun.nio.ch.Net.bind(Net.java:436)
              ...
              ...
      

      It turns out the drill-bit failed to shutdown correctly and an internal process was still running.

      # ps -ef |grep drill
      mapr      2807     1  0 Apr25 ?        00:00:00 bash /opt/mapr/drill/drill-0.9.0/bin/drillbit.sh internal_start drillbit
      mapr      2862  2807  0 Apr25 ?        00:18:54 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/bin/java -Dlog.path=/opt/mapr/drill/drill-0.9.0/log/drillbit.log -Xms1G -Xmx16G -XX:MaxDirectMemorySize=48G -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=1G -ea -Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf -Dzookeeper.sasl.client=false -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -cp /opt/mapr/drill/drill-0.9.0/conf:/opt/mapr/drill/drill-0.9.0/jars/*:/opt/mapr/drill/drill-0.9.0/jars/ext/*:/opt/mapr/drill/drill-0.9.0/jars/3rdparty/*:/opt/mapr/drill/drill-0.9.0/jars/classb/* org.apache.drill.exec.server.Drillbit
      

      Killing this process helped bring up drill-bits on all nodes.

      Attachments

        Activity

          People

            Unassigned Unassigned
            agirish Abhishek Girish
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: