Solr
  1. Solr
  2. SOLR-7693

"bin/solr start -e cloud" will not work if lsof is not installed - script exits as soon as 1st node is started

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 5.2.1
    • Fix Version/s: 5.3, 6.0
    • Component/s: Build
    • Labels:
      None
    • Environment:

      Boot2Docker, Docker container with Oracle Linux, JDK 8, Solr 5.2.1

      Description

      If bin/solr is used on a system which does not have lsof available a code path is used when starting up solr nodes that causes bin/solr to exist as soon as the first solr node is launched.

      the work around is to either install lsof, or manually start up each of the additional nodes, and create the collection, after the "-e cloud" command exits...

      solr start -cloud -s example/cloud/node2/solr -p XXXX -z localhost:9983
      solr start -cloud -s example/cloud/node3/solr -p YYYY -z localhost:9983
      ...
      bin/solr create -c gettingstarted -replicationFactor N -shards M -d data_driven_schema_configs
      

      Original bug report...

      Extract from the command prompt on starting up solr cloud :

      -------------------------------------------------------------------------------------------
      [appuser@mysolrsandbox ~]$ cd $HOME/softwares/solr-5.2.1
      [appuser@mysolrsandbox solr-5.2.1]$ bin/solr start -e cloud -noprompt -m 1g

      Welcome to the SolrCloud example!

      Starting up 2 Solr nodes for your example SolrCloud cluster.
      Creating Solr home directory /home/appuser/softwares/solr-5.2.1/example/cloud/node1/solr
      Cloning Solr home directory /home/appuser/softwares/solr-5.2.1/example/cloud/node1 into /home/appuser/softwares/solr-5.2.1/example/cloud/node2

      Starting up SolrCloud node1 on port 8983 using command:

      solr start -cloud -s example/cloud/node1/solr -p 8983 -m 1g

      Started Solr server on port 8983 (pid=102). Happy searching!
      [appuser@mysolrsandbox solr-5.2.1]$

      ------------------------------------------------------------------------------------------------

      The second node is not starting up.

      Possible issue :
      File : $SOLR_HOME/bin/solr
      Line number : 1431
      – The "exit;" command is causing the shell scrip to exit.

      Line 1428 - 1432
      else
      SOLR_PID=`ps auxww | grep start\.jar | grep -w $SOLR_PORT | grep -v grep | awk '

      Unknown macro: {print $2}

      ' | sort -r`
      echo -e "\nStarted Solr server on port $SOLR_PORT (pid=$SOLR_PID). Happy searching!\n"
      exit;
      fi

      Work Around :
      Comment line 1431 in the shell script
      Line 1428 - 1432
      else
      SOLR_PID=`ps auxww | grep start\.jar | grep -w $SOLR_PORT | grep -v grep | awk '

      ' | sort -r`
      echo -e "\nStarted Solr server on port $SOLR_PORT (pid=$SOLR_PID). Happy searching!\n"
      #exit;
      fi

      1. SOLR-7693.patch
        1 kB
        Timothy Potter
      2. SOLR-7693.patch
        1.0 kB
        Hoss Man

        Activity

        Hide
        Hoss Man added a comment -

        if you are only seeing this in your docker container that sounds like a shell discrepancy.

        what exactly is the output of /usr/bin/env bash -version on these containers?

        Show
        Hoss Man added a comment - if you are only seeing this in your docker container that sounds like a shell discrepancy. what exactly is the output of /usr/bin/env bash -version on these containers?
        Hide
        Hoss Man added a comment -

        Hmm... my initial impression was that this was an issue of exit on a sub-shell process, but looking closer this could be a big in some conditional logic depending on whether lsof is available on the system?

        My bash is rusty, but IIUC it looks like the launch_solr function will "exit" the script completely in the "no lsof found" case, but simply return in the "lsof found" case.

        Raghavan: can you please confirm the output of hash lsof; echo $? on the machine where you see this problem?

        Show
        Hoss Man added a comment - Hmm... my initial impression was that this was an issue of exit on a sub-shell process, but looking closer this could be a big in some conditional logic depending on whether lsof is available on the system? My bash is rusty, but IIUC it looks like the launch_solr function will "exit" the script completely in the "no lsof found" case, but simply return in the "lsof found" case. Raghavan: can you please confirm the output of hash lsof; echo $? on the machine where you see this problem?
        Hide
        Hoss Man added a comment -

        My bash is rusty, but IIUC it looks like the launch_solr function will "exit" the script completely in the "no lsof found" case, but simply return in the "lsof found" case.

        Nope - i'm blind. the launch_solr function calls exit regardless of whether lsof exists ... i'm back to my initial hypothesis regarding something wonky with bash to explain why you're seeing this.

        Show
        Hoss Man added a comment - My bash is rusty, but IIUC it looks like the launch_solr function will "exit" the script completely in the "no lsof found" case, but simply return in the "lsof found" case. Nope - i'm blind. the launch_solr function calls exit regardless of whether lsof exists ... i'm back to my initial hypothesis regarding something wonky with bash to explain why you're seeing this.
        Hide
        Raghavan Janakiraman added a comment -

        [appuser@mysolrsandbox ~]$ /usr/bin/env bash -version
        GNU bash, version 4.2.46(1)-release (x86_64-redhat-linux-gnu)
        Copyright (C) 2011 Free Software Foundation, Inc.
        License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

        This is free software; you are free to change and redistribute it.
        There is NO WARRANTY, to the extent permitted by law.

        --------------------------------------------------------------------------------------------------------------------

        [appuser@mysolrsandbox ~]$ hash lsof; echo $?
        bash: hash: lsof: not found
        1

        -----------------------------------------------------------------------------------------------------------------

        Show
        Raghavan Janakiraman added a comment - [appuser@mysolrsandbox ~] $ /usr/bin/env bash -version GNU bash, version 4.2.46(1)-release (x86_64-redhat-linux-gnu) Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later < http://gnu.org/licenses/gpl.html > This is free software; you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. -------------------------------------------------------------------------------------------------------------------- [appuser@mysolrsandbox ~] $ hash lsof; echo $? bash: hash: lsof: not found 1 -----------------------------------------------------------------------------------------------------------------
        Hide
        Raghavan Janakiraman added a comment -

        The way I understand the launch_solr function

        1. If to be launched in back ground
        1.1. Launch the java process in nohup
        1.2. If OS supports lsof (list of open files) function
        1.2.1. Open a sub shell
        1.2.2. Poll the list of open files for the port on which the java process is launched till 30 seconds or the port becomes active which ever is earlier
        1.2.3. Display the launch message
        1.2.4. Exit sub shell
        1.3. Else // if OS does not support lsof
        1.3.1. Try a grep to get the process id of the java process launched
        1.3.2. Display the launch message
        1.3.3. Eixt // (Actually this should be a return and not exit, as the else clause is not launched in a sub shell)
        1.4 End If

        Show
        Raghavan Janakiraman added a comment - The way I understand the launch_solr function 1. If to be launched in back ground 1.1. Launch the java process in nohup 1.2. If OS supports lsof (list of open files) function 1.2.1. Open a sub shell 1.2.2. Poll the list of open files for the port on which the java process is launched till 30 seconds or the port becomes active which ever is earlier 1.2.3. Display the launch message 1.2.4. Exit sub shell 1.3. Else // if OS does not support lsof 1.3.1. Try a grep to get the process id of the java process launched 1.3.2. Display the launch message 1.3.3. Eixt // (Actually this should be a return and not exit, as the else clause is not launched in a sub shell) 1.4 End If
        Hide
        Hoss Man added a comment -

        Ah.... yes, the part i was overlooking is that a sub-shell is used in the lsof case.

        updated summary &description to clarify root cause (systems w/o lsof) and list workarround.

        Show
        Hoss Man added a comment - Ah.... yes, the part i was overlooking is that a sub-shell is used in the lsof case. updated summary &description to clarify root cause (systems w/o lsof) and list workarround.
        Hide
        Hoss Man added a comment -

        untested patch (not really interesting in deleting lsof from my machine at the moment)

        Show
        Hoss Man added a comment - untested patch (not really interesting in deleting lsof from my machine at the moment)
        Hide
        Hoss Man added a comment -

        Tim: please review & commit (unless i've made some horrible mistake)

        Show
        Hoss Man added a comment - Tim: please review & commit (unless i've made some horrible mistake)
        Hide
        Timothy Potter added a comment -

        Thanks for fixing Hoss. I spun up an instance in EC2 and uninstalled lsof. I had to add a 10 second wait to the block that handles the case where lsof is not installed, otherwise the script progressed to trying to create the collection too quickly (before the nodes were up). I also added a note to let users know they should install lsof for this script.

        Show
        Timothy Potter added a comment - Thanks for fixing Hoss. I spun up an instance in EC2 and uninstalled lsof. I had to add a 10 second wait to the block that handles the case where lsof is not installed, otherwise the script progressed to trying to create the collection too quickly (before the nodes were up). I also added a note to let users know they should install lsof for this script.
        Hide
        Upayavira added a comment -

        Throwaway comment/thought - reimplementing the LSOF behaviour we depend upon in Java wouldn't take much effort, and if Java isn't present, we've got bigger problems!

        Show
        Upayavira added a comment - Throwaway comment/thought - reimplementing the LSOF behaviour we depend upon in Java wouldn't take much effort, and if Java isn't present, we've got bigger problems!
        Hide
        ASF subversion and git services added a comment -

        Commit 1686113 from Timothy Potter in branch 'dev/trunk'
        [ https://svn.apache.org/r1686113 ]

        SOLR-7693: Fix the bin/solr -e cloud example to work if lsof is not installed

        Show
        ASF subversion and git services added a comment - Commit 1686113 from Timothy Potter in branch 'dev/trunk' [ https://svn.apache.org/r1686113 ] SOLR-7693 : Fix the bin/solr -e cloud example to work if lsof is not installed
        Hide
        ASF subversion and git services added a comment -

        Commit 1686114 from Timothy Potter in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1686114 ]

        SOLR-7693: Fix the bin/solr -e cloud example to work if lsof is not installed

        Show
        ASF subversion and git services added a comment - Commit 1686114 from Timothy Potter in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1686114 ] SOLR-7693 : Fix the bin/solr -e cloud example to work if lsof is not installed
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Timothy Potter
            Reporter:
            Raghavan Janakiraman
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development