Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8522

ImplicitSnitch to support IPv4 fragment tags

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 5.4
    • Fix Version/s: 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      This is a description from Noble Paul's comment on SOLR-8146

      IPv4 fragment tags

      Lets assume a Solr node IPv4 address is 192.93.255.255 .

      This is about enhancing the current ImplicitSnitch to support IP based tags like:

      • hostfrag_1 = 255
      • hostfrag_2 = 255
      • hostfrag_3 = 93
      • hostfrag_4 = 192

      Note that IPv6 support will be implemented by a separate ticket

      Host name fragment tags

      Lets assume a Solr node host name serv1.dc1.country1.apache.org .

      This is about enhancing the current ImplicitSnitch to support tags like:

      • hostfrag_1 = org
      • hostfrag_2 = apache
      • hostfrag_3 = country1
      • hostfrag_4 = dc1
      • hostfrag_5 = serv1
      1. SOLR-8522.patch
        17 kB
        Arcadius Ahouansou
      2. SOLR-8522.patch
        17 kB
        Arcadius Ahouansou
      3. SOLR-8522.patch
        14 kB
        Arcadius Ahouansou
      4. SOLR-8522.patch
        17 kB
        Arcadius Ahouansou
      5. SOLR-8522.patch
        14 kB
        Arcadius Ahouansou

        Issue Links

          Activity

          Hide
          arcadius Arcadius Ahouansou added a comment - - edited

          Hello Noble Paul.
          This is the initial patch for IPv4 tag support.
          Please, let me know of any comment or suggestion you may have.

          Thank you very much.

          Show
          arcadius Arcadius Ahouansou added a comment - - edited Hello Noble Paul . This is the initial patch for IPv4 tag support. Please, let me know of any comment or suggestion you may have. Thank you very much.
          Hide
          noble.paul Noble Paul added a comment -
          • The solr node string may not always be an IP address. It could be something like {[host:port}} . So IP address needs a lookup
          • Let's start from least significant to most significant. 192.93.255.255 where ip_1=255 to ip_4=192
          • Do not blindly add a tag . Add if it is only requested
          Show
          noble.paul Noble Paul added a comment - The solr node string may not always be an IP address. It could be something like {[host:port}} . So IP address needs a lookup Let's start from least significant to most significant. 192.93.255.255 where ip_1=255 to ip_4=192 Do not blindly add a tag . Add if it is only requested
          Hide
          arcadius Arcadius Ahouansou added a comment - - edited

          Thank you very much Noble Paulul] for taking the time to look into this

          The solr node string may not always be an IP address. It could be something like {[host:port}} . So IP address needs a lookup

          You are right about this. I was not aware of this.
          Turned out that a user could start Solr with -Dhost=someHostName.
          Doing the lookup as suggested is quite simple. However, 1 host could have multiple public and private IPs. We could pick the first public one or something...

          This led me to start contemplating the idea of a more generic snitch that will deal with host names as well as IPs like
          192.168.1.2 -> host_1=2, host_2=1, host_3=168, host_4=192
          and
          serv1.dc1.london.uk.apache.org -> host_1=org, host_2=apache, host_3=uk, host_4=london, host_5=dc1, host_6=serv1

          Any comment about this?

          Let's start from least significant to most significant

          Yes, makes sense

          Do not blindly add a tag . Add if it is only requested

          The current implementation adds only the tags that are requested.
          The one that are not requested are not added to the response.
          This is tested in

          • testGetTagsWithEmptyIPv4RequestedTag() where no tag is requested -> none returned, and
          • testGetTagsWithIPv4RequestedTags_ip_2_ip_4() where only 2 tags are requested leading to only 2 out of 4 being returned

          Please let me know about the idea of a more generic snitch that could handle host names as well.

          Many thanks

          Show
          arcadius Arcadius Ahouansou added a comment - - edited Thank you very much Noble Paul ul] for taking the time to look into this The solr node string may not always be an IP address. It could be something like {[host:port}} . So IP address needs a lookup You are right about this. I was not aware of this. Turned out that a user could start Solr with -Dhost=someHostName. Doing the lookup as suggested is quite simple. However, 1 host could have multiple public and private IPs. We could pick the first public one or something... This led me to start contemplating the idea of a more generic snitch that will deal with host names as well as IPs like 192.168.1.2 -> host_1=2, host_2=1, host_3=168, host_4=192 and serv1.dc1.london.uk.apache.org -> host_1=org, host_2=apache, host_3=uk, host_4=london, host_5=dc1, host_6=serv1 Any comment about this? Let's start from least significant to most significant Yes, makes sense Do not blindly add a tag . Add if it is only requested The current implementation adds only the tags that are requested. The one that are not requested are not added to the response. This is tested in testGetTagsWithEmptyIPv4RequestedTag() where no tag is requested -> none returned, and testGetTagsWithIPv4RequestedTags_ip_2_ip_4() where only 2 tags are requested leading to only 2 out of 4 being returned Please let me know about the idea of a more generic snitch that could handle host names as well. Many thanks
          Hide
          noble.paul Noble Paul added a comment -

          It's OK whether you ad tags for host as well. just that it doesn't match with the description and title

          Show
          noble.paul Noble Paul added a comment - It's OK whether you ad tags for host as well. just that it doesn't match with the description and title
          Hide
          arcadius Arcadius Ahouansou added a comment - - edited

          Hello Noble Paul
          I have made all suggested changes.
          I have also added support for host names.
          For now, the snitch name is in the format hostfrag_N where N is a number.
          I am open to new suggestions... especially about the name of the snitch.

          Show
          arcadius Arcadius Ahouansou added a comment - - edited Hello Noble Paul I have made all suggested changes. I have also added support for host names. For now, the snitch name is in the format hostfrag_N where N is a number. I am open to new suggestions... especially about the name of the snitch.
          Hide
          noble.paul Noble Paul added a comment -

          The test cases have node strings given in the wrong format

          http://127.0.0.1:54869/oubz/collection1j is wrong
          127.0.0.1:54869_oubzj is right

          Show
          noble.paul Noble Paul added a comment - The test cases have node strings given in the wrong format http://127.0.0.1:54869/oubz/collection1j is wrong 127.0.0.1:54869_oubzj is right
          Hide
          arcadius Arcadius Ahouansou added a comment - - edited

          Hello again Noble Paul
          Thank you very much for taking the time to have a look.

          I need one clarification:

          In the original ImplicitSnitch.java, we have:

          Pattern hostAndPortPattern = Pattern.compile("(?:https?://)?([^:]+):(\\d+)")
          

          Is that regex accurate given that node names do not contain any http or https in the format specified above?

          Thank you very much

          Show
          arcadius Arcadius Ahouansou added a comment - - edited Hello again Noble Paul Thank you very much for taking the time to have a look. I have changed the node format from http://host:port/context/collection ... to host:port_context To further simplify things, I have also removed all tests related to SOLR-8523 . Tests will be added later to SOLR-8523 I need one clarification: In the original ImplicitSnitch.java , we have: Pattern hostAndPortPattern = Pattern.compile( "(?:https?: //)?([^:]+):(\\d+)" ) Is that regex accurate given that node names do not contain any http or https in the format specified above? Thank you very much
          Hide
          noble.paul Noble Paul added a comment -

          I don't think this is complete

          imagine I provide serv1.dc1.london.uk.apache.org as the host name, I should still be able to use ip_1 because evenntually serv1.dc1.london.uk.apache.org should resolve to an ip address.

          Show
          noble.paul Noble Paul added a comment - I don't think this is complete imagine I provide serv1.dc1.london.uk.apache.org as the host name, I should still be able to use ip_1 because evenntually serv1.dc1.london.uk.apache.org should resolve to an ip address.
          Hide
          arcadius Arcadius Ahouansou added a comment -

          imagine I provide serv1.dc1.london.uk.apache.org as the host name, I should still be able to use ip_1 because evenntually serv1.dc1.london.uk.apache.org should

          Yes, serv1.dc1.london.uk.apache.org will eventually be resolved into an IP, but this will happen only outside of Solr scope and solr will only know of serv1.dc1.london.uk.apache.org IMHO

          Show
          arcadius Arcadius Ahouansou added a comment - imagine I provide serv1.dc1.london.uk.apache.org as the host name, I should still be able to use ip_1 because evenntually serv1.dc1.london.uk.apache.org should Yes, serv1.dc1.london.uk.apache.org will eventually be resolved into an IP, but this will happen only outside of Solr scope and solr will only know of serv1.dc1.london.uk.apache.org IMHO
          Hide
          noble.paul Noble Paul added a comment -

          Why? Why can't a user use IP_1 even if he uses host names

          Show
          noble.paul Noble Paul added a comment - Why? Why can't a user use IP_1 even if he uses host names
          Hide
          arcadius Arcadius Ahouansou added a comment -

          Hello again Noble Paul
          From my understanding, a server could have multiple host names and multiple public IPs.

          And by simply starting solr without the -Dhost switch, Solr will grab one of the assigned IPs and use it.
          This sometimes is not desirable as it leads to issues ... and that is when -Dhost is used to force Solr to use a provided host name or IP.

          I downloaded solr 5.4.1 and I ran

          solr-5.4.1/bin$ ./solr start -e cloud -a "-Dhost=linux01"

          Then, the output of
          curl "http://localhost:8983/solr/admin/collections?action=clusterstatus&wt=json&indent=on"

          is:

          {
             "responseHeader":{
                "status":0,
                "QTime":2
             },
             "cluster":{
                "collections":{
                   "gettingstarted":{
                      "replicationFactor":"2",
                      "shards":{
                         "shard1":{
                            "range":"80000000-ffffffff",
                            "state":"active",
                            "replicas":{
                               "core_node1":{
                                  "core":"gettingstarted_shard1_replica1",
                                  "base_url":"http://linux01:8983/solr",
                                  "node_name":"linux01:8983_solr",
                                  "state":"active"
                               },
                               "core_node4":{
                                  "core":"gettingstarted_shard1_replica2",
                                  "base_url":"http://linux01:7574/solr",
                                  "node_name":"linux01:7574_solr",
                                  "state":"active",
                                  "leader":"true"
                               }
                            }
                         },
                         "shard2":{
                            "range":"0-7fffffff",
                            "state":"active",
                            "replicas":{
                               "core_node2":{
                                  "core":"gettingstarted_shard2_replica1",
                                  "base_url":"http://linux01:8983/solr",
                                  "node_name":"linux01:8983_solr",
                                  "state":"active"
                               },
                               "core_node3":{
                                  "core":"gettingstarted_shard2_replica2",
                                  "base_url":"http://linux01:7574/solr",
                                  "node_name":"linux01:7574_solr",
                                  "state":"active",
                                  "leader":"true"
                               }
                            }
                         }
                      },
                      "router":{
                         "name":"compositeId"
                      },
                      "maxShardsPerNode":"2",
                      "autoAddReplicas":"false",
                      "znodeVersion":8,
                      "configName":"gettingstarted"
                   }
                },
                "live_nodes":[
                   "linux01:7574_solr",
                   "linux01:8983_solr"
                ]
             }
          }
          

          Please note the line with
          "base_url":"http://linux01:8983/solr"

          So, any client (curl or apache httpClient) will be using URL of the form http://linux01:8983/solr
          which as you said will be resolved into in IP, but this will happen outside of Solr IMHO.

          So, in my humble opinion (and correct me if I am wrong), Solr and the cluster assignment rules will know only about the host name "linux01"?

          Thank you very much Noble Paul for your time.

          Show
          arcadius Arcadius Ahouansou added a comment - Hello again Noble Paul From my understanding, a server could have multiple host names and multiple public IPs. And by simply starting solr without the -Dhost switch, Solr will grab one of the assigned IPs and use it. This sometimes is not desirable as it leads to issues ... and that is when -Dhost is used to force Solr to use a provided host name or IP. I downloaded solr 5.4.1 and I ran solr-5.4.1/bin$ ./solr start -e cloud -a "-Dhost=linux01" Then, the output of curl "http://localhost:8983/solr/admin/collections?action=clusterstatus&wt=json&indent=on" is: { "responseHeader" :{ "status" :0, "QTime" :2 }, "cluster" :{ "collections" :{ "gettingstarted" :{ "replicationFactor" : "2" , "shards" :{ "shard1" :{ "range" : "80000000-ffffffff" , "state" : "active" , "replicas" :{ "core_node1" :{ "core" : "gettingstarted_shard1_replica1" , "base_url" : "http: //linux01:8983/solr" , "node_name" : "linux01:8983_solr" , "state" : "active" }, "core_node4" :{ "core" : "gettingstarted_shard1_replica2" , "base_url" : "http: //linux01:7574/solr" , "node_name" : "linux01:7574_solr" , "state" : "active" , "leader" : " true " } } }, "shard2" :{ "range" : "0-7fffffff" , "state" : "active" , "replicas" :{ "core_node2" :{ "core" : "gettingstarted_shard2_replica1" , "base_url" : "http: //linux01:8983/solr" , "node_name" : "linux01:8983_solr" , "state" : "active" }, "core_node3" :{ "core" : "gettingstarted_shard2_replica2" , "base_url" : "http: //linux01:7574/solr" , "node_name" : "linux01:7574_solr" , "state" : "active" , "leader" : " true " } } } }, "router" :{ "name" : "compositeId" }, "maxShardsPerNode" : "2" , "autoAddReplicas" : " false " , "znodeVersion" :8, "configName" : "gettingstarted" } }, "live_nodes" :[ "linux01:7574_solr" , "linux01:8983_solr" ] } } Please note the line with "base_url":"http://linux01:8983/solr" So, any client (curl or apache httpClient) will be using URL of the form http://linux01:8983/solr which as you said will be resolved into in IP, but this will happen outside of Solr IMHO. So, in my humble opinion (and correct me if I am wrong), Solr and the cluster assignment rules will know only about the host name "linux01"? Thank you very much Noble Paul for your time.
          Hide
          arcadius Arcadius Ahouansou added a comment -

          Hi Noble Paul

          Let's say the host serv1.dc1.london.uk.apache.org resolves into 3 public IPs.
          Which one should I use then?

          Please read my other comment below

          Show
          arcadius Arcadius Ahouansou added a comment - Hi Noble Paul Let's say the host serv1.dc1.london.uk.apache.org resolves into 3 public IPs. Which one should I use then? Please read my other comment below
          Hide
          noble.paul Noble Paul added a comment -

          solr will only use the linux01 but that is not to say that ip cannot be used. It should be the preference of the user to choose ip_1 .

          My suggestion is

          both host_1 and ip_1 should be valid and must work. If the node string is a host name , resolve the ip address and get the ip_1 value

          Show
          noble.paul Noble Paul added a comment - solr will only use the linux01 but that is not to say that ip cannot be used. It should be the preference of the user to choose ip_1 . My suggestion is both host_1 and ip_1 should be valid and must work. If the node string is a host name , resolve the ip address and get the ip_1 value
          Hide
          arcadius Arcadius Ahouansou added a comment -

          Resolving hostname->IP is simple enough.

          However, when a host name resolves into multiple IPs, which one to choose?

          Show
          arcadius Arcadius Ahouansou added a comment - Resolving hostname->IP is simple enough. However, when a host name resolves into multiple IPs, which one to choose?
          Hide
          noble.paul Noble Paul added a comment - - edited

          just do
          {{
          InetAddress inetAddr = InetAddress.getByName("hostname");
          }}

          Show
          noble.paul Noble Paul added a comment - - edited just do {{ InetAddress inetAddr = InetAddress.getByName("hostname"); }}
          Hide
          arcadius Arcadius Ahouansou added a comment -

          Hello Noble Paul
          I have updated the patch to lookup ip addresses.
          Thanks.

          Show
          arcadius Arcadius Ahouansou added a comment - Hello Noble Paul I have updated the patch to lookup ip addresses. Thanks.
          Hide
          noble.paul Noble Paul added a comment -

          sorry for the delay.
          could you please update the patch to latest trunk and I shall commit this

          Show
          noble.paul Noble Paul added a comment - sorry for the delay. could you please update the patch to latest trunk and I shall commit this
          Hide
          arcadius Arcadius Ahouansou added a comment - - edited

          Patch updated to latest git master branch

          Show
          arcadius Arcadius Ahouansou added a comment - - edited Patch updated to latest git master branch
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit cf964326309feb7a5a41a3e4f22cad073807a097 in lucene-solr's branch refs/heads/master from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cf96432 ]

          SOLR-8522: Make it possible to use ip fragments in replica placement rules , such as ip_1, ip_2 etc

          Show
          jira-bot ASF subversion and git services added a comment - Commit cf964326309feb7a5a41a3e4f22cad073807a097 in lucene-solr's branch refs/heads/master from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cf96432 ] SOLR-8522 : Make it possible to use ip fragments in replica placement rules , such as ip_1, ip_2 etc
          Hide
          arcadius Arcadius Ahouansou added a comment -

          Thank you very much Noble Paul for your valuable help on this issue.

          Show
          arcadius Arcadius Ahouansou added a comment - Thank you very much Noble Paul for your valuable help on this issue.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit cf964326309feb7a5a41a3e4f22cad073807a097 in lucene-solr's branch refs/heads/apiv2 from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cf96432 ]

          SOLR-8522: Make it possible to use ip fragments in replica placement rules , such as ip_1, ip_2 etc

          Show
          jira-bot ASF subversion and git services added a comment - Commit cf964326309feb7a5a41a3e4f22cad073807a097 in lucene-solr's branch refs/heads/apiv2 from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cf96432 ] SOLR-8522 : Make it possible to use ip fragments in replica placement rules , such as ip_1, ip_2 etc
          Hide
          k317h Keith Laban added a comment -

          I'm getting test failures due to some of the tests introduced in this ticket. I opened an issue at SOLR-9183

          Show
          k317h Keith Laban added a comment - I'm getting test failures due to some of the tests introduced in this ticket. I opened an issue at SOLR-9183
          Hide
          arcadius Arcadius Ahouansou added a comment -

          Hi Keith Laban
          I commented on SOLR-9183

          Show
          arcadius Arcadius Ahouansou added a comment - Hi Keith Laban I commented on SOLR-9183

            People

            • Assignee:
              noble.paul Noble Paul
              Reporter:
              arcadius Arcadius Ahouansou
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development