Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-4331

Make port configuration and allocation consistent across services

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: None
    • Labels:
      None

      Description

      There was some discussion in ACCUMULO-4328 about ports, so I decided to track down how the client ports are configured and allocated. Issues raised in the discussion were:

      1. The port search feature was not well understood
      2. Ephemeral port allocation makes it hard to lock servers down (e.g. iptables)

      Looking through the code I found the following properties allocate a port number based on conf.getPort(). This returns the port number based on the property and supports either a single value or zero. Then, in the server component (monitor, tracer, gc, etc) this value is used when creating a ServerSocket. If the port is already in use, the process will fail.

      monitor.port.log4j
      trace.port.client
      gc.port.client
      monitor.port.client
      

      The following properties use TServerUtils.startServer which uses the value in the property to start the TServer. If the value is zero, then it picks a random port between 1024 and 65535. If tserver.port.search is enabled, then it will try a thousand times to bind to a random port.

      tserver.port.client
      master.port.client
      master.replication.coordinator.port
      replication.receipt.service.port
      

      I'm proposing that we deprecate the tserver.port.search property and the value zero in the property value for the properties above. Instead, I think we should allow the user to specify a single value or a range (M-N).

        Issue Links

          Activity

          Hide
          billie.rinaldi Billie Rinaldi added a comment -

          I'd prefer not deprecating the 0 option. Accumulo on Slider uses that.

          Show
          billie.rinaldi Billie Rinaldi added a comment - I'd prefer not deprecating the 0 option. Accumulo on Slider uses that.
          Hide
          dlmarion Dave Marion added a comment -

          It couldn't be removed until 2.0. Does that change things?

          Show
          dlmarion Dave Marion added a comment - It couldn't be removed until 2.0. Does that change things?
          Hide
          billie.rinaldi Billie Rinaldi added a comment -

          Not really; it's easier to set it to 0 when you don't care what the port is, and that's consistent with other projects (e.g. HBase). What's the argument for not allowing 0?

          Show
          billie.rinaldi Billie Rinaldi added a comment - Not really; it's easier to set it to 0 when you don't care what the port is, and that's consistent with other projects (e.g. HBase). What's the argument for not allowing 0?
          Hide
          elserj Josh Elser added a comment -

          I'd prefer not deprecating the 0 option. Accumulo on Slider uses that.

          Yeah, It doesn't make sense to me to deprecate using random ports, it's just not something that Accumulo out of the box would recommend (because of what we know about production environments of security-conscious people). I think that adequate documentation about when '0' makes sense to use would be better than removing entirely.

          Instead, I think we should allow the user to specify a single value or a range (M-N).

          I think this would be a nice thing to add.

          Show
          elserj Josh Elser added a comment - I'd prefer not deprecating the 0 option. Accumulo on Slider uses that. Yeah, It doesn't make sense to me to deprecate using random ports, it's just not something that Accumulo out of the box would recommend (because of what we know about production environments of security-conscious people). I think that adequate documentation about when '0' makes sense to use would be better than removing entirely. Instead, I think we should allow the user to specify a single value or a range (M-N). I think this would be a nice thing to add.
          Hide
          dlmarion Dave Marion added a comment -

          It makes it hard to lock ports down on a server (iptables)

          Show
          dlmarion Dave Marion added a comment - It makes it hard to lock ports down on a server (iptables)
          Hide
          mjwall Michael Wall added a comment -

          It would be really useful if we are going to run multiple tservers on a node to be able to set static ports. So if I want to run 3 on a node, I can configure them to always run on 9997, 9998 and 9999 or whatever. That is useful from a system monitoring perspective. Also useful because in the tserver page in the monitor, when a tserver dies and comes back up on the same port, it would be removed from the list of dead tservers. I find it useful to follow restarts of a tserver in one log file as well, although I think Dave Marion has that working already. Would love to see the port as part of the log4j filename as well, so you know automatically which log file to open instead of having to grep for something in all the log files.

          Show
          mjwall Michael Wall added a comment - It would be really useful if we are going to run multiple tservers on a node to be able to set static ports. So if I want to run 3 on a node, I can configure them to always run on 9997, 9998 and 9999 or whatever. That is useful from a system monitoring perspective. Also useful because in the tserver page in the monitor, when a tserver dies and comes back up on the same port, it would be removed from the list of dead tservers. I find it useful to follow restarts of a tserver in one log file as well, although I think Dave Marion has that working already. Would love to see the port as part of the log4j filename as well, so you know automatically which log file to open instead of having to grep for something in all the log files.
          Hide
          elserj Josh Elser added a comment -

          so you know automatically which log file to open instead of having to grep for something in all the log files.

          I know that Keith/Dave have been talking on the PR for ACCUMULO-4328, but maybe something could be added (like our -Dproc=tserver) now that could help correlate instances to log file. I feel like trying to include port will just be tricky (because we want logging set up before we're going to bind the server to a port).

          Show
          elserj Josh Elser added a comment - so you know automatically which log file to open instead of having to grep for something in all the log files. I know that Keith/Dave have been talking on the PR for ACCUMULO-4328 , but maybe something could be added (like our -Dproc=tserver ) now that could help correlate instances to log file. I feel like trying to include port will just be tricky (because we want logging set up before we're going to bind the server to a port).
          Hide
          mjwall Michael Wall added a comment -

          Looking for the code now that sets up the log4j file, but could you do -Dproc=tserver -Dport=9997 and have affect the log file name?

          Show
          mjwall Michael Wall added a comment - Looking for the code now that sets up the log4j file, but could you do -Dproc=tserver -Dport=9997 and have affect the log file name?
          Hide
          mikewalch Mike Walch added a comment -

          It would be great if the 0 option was kept but users could specify a single value or a range (M-N). The 0 option is just shorthand for the range 1024-65535.

          Show
          mikewalch Mike Walch added a comment - It would be great if the 0 option was kept but users could specify a single value or a range (M-N). The 0 option is just shorthand for the range 1024-65535.
          Hide
          kturner Keith Turner added a comment -

          That's an interesting observation. Who would have thought that 0=1024-65553

          Show
          kturner Keith Turner added a comment - That's an interesting observation. Who would have thought that 0=1024-65553
          Hide
          dlmarion Dave Marion added a comment -

          The instance is in the log file name in the PR.

          Show
          dlmarion Dave Marion added a comment - The instance is in the log file name in the PR.
          Hide
          dlmarion Dave Marion added a comment -

          And note that 0 is totally random, not a 1 up attempt until it succeeds

          Show
          dlmarion Dave Marion added a comment - And note that 0 is totally random, not a 1 up attempt until it succeeds
          Hide
          dlmarion Dave Marion added a comment -

          I don't know that we can do what you are suggesting in the manner in which you are suggesting it. You would have to move the port allocation code out of Java and into the shell scripts. Another approach would be to push the port into the Log4J MDC and then add the MDC key to the logging pattern. I had taken this approach, but using the PID, in the PR against 1.6.x

          Show
          dlmarion Dave Marion added a comment - I don't know that we can do what you are suggesting in the manner in which you are suggesting it. You would have to move the port allocation code out of Java and into the shell scripts. Another approach would be to push the port into the Log4J MDC and then add the MDC key to the logging pattern. I had taken this approach, but using the PID, in the PR against 1.6.x
          Hide
          dlmarion Dave Marion added a comment -

          If we set tserver.port.client to "9997,9998,9999" or "9997-9999" then any one of the three tservers in your example would allocate any of the three ports in the range specified. Do we want to make the tablet server code aware that there could be multiple instances on a host, that it is the Nth instance, and it must allocate the Nth port in the range?

          Show
          dlmarion Dave Marion added a comment - If we set tserver.port.client to "9997,9998,9999" or "9997-9999" then any one of the three tservers in your example would allocate any of the three ports in the range specified. Do we want to make the tablet server code aware that there could be multiple instances on a host, that it is the Nth instance, and it must allocate the Nth port in the range?
          Hide
          mjwall Michael Wall added a comment - - edited

          Yeah, that is what I am suggesting I guess. Adding the ability to pass in the ports when start-server.sh is called. If 1 or more are there, iterate over the port, set -Dport, start the server on that port and use the port in the log name.

          Show
          mjwall Michael Wall added a comment - - edited Yeah, that is what I am suggesting I guess. Adding the ability to pass in the ports when start-server.sh is called. If 1 or more are there, iterate over the port, set -Dport, start the server on that port and use the port in the log name.
          Hide
          mikewalch Mike Walch added a comment -

          Agree. random is not the same as 1 up

          Show
          mikewalch Mike Walch added a comment - Agree. random is not the same as 1 up
          Hide
          mikewalch Mike Walch added a comment -

          I don't think tserver.port.search works correctly. If it's not deprecated it should be fixed. I set it to true and and set tserver.client.port to 9997. The first port that I was given was 9997 but the next port was 32843. I think a random port is being chosen rather than a 1 up. I am using 1.7.1 btw.

          Show
          mikewalch Mike Walch added a comment - I don't think tserver.port.search works correctly. If it's not deprecated it should be fixed. I set it to true and and set tserver.client.port to 9997. The first port that I was given was 9997 but the next port was 32843. I think a random port is being chosen rather than a 1 up. I am using 1.7.1 btw.
          Hide
          dlmarion Dave Marion added a comment -

          So, you don't think we should deprecate tserver.port.search in favor of a port range? We should fix it instead?

          Show
          dlmarion Dave Marion added a comment - So, you don't think we should deprecate tserver.port.search in favor of a port range? We should fix it instead?
          Hide
          dlmarion Dave Marion added a comment -

          In my discussions with Christopher Tubbs regarding this, he feels the same way. However, I think that is a different JIRA as I think that is much more work than what I am proposing here. You would have to move the port handling code out of Java and into the shell scripts.

          Show
          dlmarion Dave Marion added a comment - In my discussions with Christopher Tubbs regarding this, he feels the same way. However, I think that is a different JIRA as I think that is much more work than what I am proposing here. You would have to move the port handling code out of Java and into the shell scripts.
          Hide
          elserj Josh Elser added a comment -

          So, you don't think we should deprecate tserver.port.search in favor of a port range? We should fix it instead?

          I think making portsearch bound to a range would be much more useful. I think it's a rather obtuse configuration property now – I'd be surprised if anyone actually dug into how it works and is relying on it (even more so after Mike found that it doesn't even work right).

          Show
          elserj Josh Elser added a comment - So, you don't think we should deprecate tserver.port.search in favor of a port range? We should fix it instead? I think making portsearch bound to a range would be much more useful. I think it's a rather obtuse configuration property now – I'd be surprised if anyone actually dug into how it works and is relying on it (even more so after Mike found that it doesn't even work right).
          Hide
          dlmarion Dave Marion added a comment - - edited

          So, based on the comments here, it seems the consensus is to

          1. Keep zero as a valid port option, which means any ephemeral port
          2. Keep tserver.port.search, but maybe change it such that it performs a 1-up attempt instead of random
          3. Allow the user to specify a range of ports to be used (M-N). [new feature]
          4. If user specifies a range of ports, and tserver.port.search is disabled, any port in that range is still valid.
          5. If user specifies a range of ports, and tserver.port.search is enabled, we must try all the ports in the range before searching

          Show
          dlmarion Dave Marion added a comment - - edited So, based on the comments here, it seems the consensus is to 1. Keep zero as a valid port option, which means any ephemeral port 2. Keep tserver.port.search, but maybe change it such that it performs a 1-up attempt instead of random 3. Allow the user to specify a range of ports to be used (M-N). [new feature] 4. If user specifies a range of ports, and tserver.port.search is disabled, any port in that range is still valid. 5. If user specifies a range of ports, and tserver.port.search is enabled, we must try all the ports in the range before searching
          Hide
          kturner Keith Turner added a comment -

          Allow the user to specify a range of ports to be used (M-N). [new feature]

          How are you thinking of doing this? With a tserver.port.search.max prop?

          Show
          kturner Keith Turner added a comment - Allow the user to specify a range of ports to be used (M-N). [new feature] How are you thinking of doing this? With a tserver.port.search.max prop?
          Hide
          dlmarion Dave Marion added a comment -

          I was not thinking of doing anything with port search, other than making it use a 1up algorithm instead of random. I was thinking of allowing a single port (N or zero) or a range or ports (N-M) to be allowed in the port properties mentioned in the description. For example:

          <property>
            <name>tserver.port.client</name>
            <value>9997</value>
          </property>
          

          or

          <property>
            <name>tserver.port.client</name>
            <value>0</value>
          </property>
          

          or

          <property>
            <name>tserver.port.client</name>
            <value>9997-9999</value>
          </property>
          
          Show
          dlmarion Dave Marion added a comment - I was not thinking of doing anything with port search, other than making it use a 1up algorithm instead of random. I was thinking of allowing a single port (N or zero) or a range or ports (N-M) to be allowed in the port properties mentioned in the description. For example: <property> <name>tserver.port.client</name> <value>9997</value> </property> or <property> <name>tserver.port.client</name> <value>0</value> </property> or <property> <name>tserver.port.client</name> <value>9997-9999</value> </property>
          Hide
          elserj Josh Elser added a comment -

          nit-picky, but I'd encourage use of standard "math" notation: [9997,9999], [9997,10000). I think there's some stuff in guava or commons-* that would handle that parsing for us.

          Show
          elserj Josh Elser added a comment - nit-picky, but I'd encourage use of standard "math" notation: [9997,9999] , [9997,10000). I think there's some stuff in guava or commons-* that would handle that parsing for us.
          Hide
          kturner Keith Turner added a comment -

          I like the idea of using that notation. I looked around and did not find any thing in Guava to parse it. Did find some simeple code to do it on stackoverflow. Could use this code and a Guava Range.

          http://stackoverflow.com/questions/33552639/parsing-interval-notation-to-guava-range

          Show
          kturner Keith Turner added a comment - I like the idea of using that notation. I looked around and did not find any thing in Guava to parse it. Did find some simeple code to do it on stackoverflow. Could use this code and a Guava Range. http://stackoverflow.com/questions/33552639/parsing-interval-notation-to-guava-range
          Hide
          dlmarion Dave Marion added a comment -

          So, would this be for the range syntax, or the syntax in all cases? [9997,9997], [0,0], [9997,9999)

          Show
          dlmarion Dave Marion added a comment - So, would this be for the range syntax, or the syntax in all cases? [9997,9997] , [0,0] , [9997,9999)
          Hide
          elserj Josh Elser added a comment -

          So, would this be for the range syntax, or the syntax in all cases? [9997,9997], [0,0], [9997,9999)

          We should still support single numbers, so I think [0,0] could just be 0. We could try to parse it as an integer first, and if that fails, try to parse it as a range. Internally, you can obviously parse/pass the number/range however you'd like.

          Show
          elserj Josh Elser added a comment - So, would this be for the range syntax, or the syntax in all cases? [9997,9997], [0,0], [9997,9999) We should still support single numbers, so I think [0,0] could just be 0 . We could try to parse it as an integer first, and if that fails, try to parse it as a range. Internally, you can obviously parse/pass the number/range however you'd like.
          Hide
          dlmarion Dave Marion added a comment -

          Thinking about the syntax for the range, I think simpler is better for the user. I think if we document M-N as inclusive, then it should not be an issue. I think most people think of a list of numbers or a range as inclusive by default.

          Show
          dlmarion Dave Marion added a comment - Thinking about the syntax for the range, I think simpler is better for the user. I think if we document M-N as inclusive, then it should not be an issue. I think most people think of a list of numbers or a range as inclusive by default.

            People

            • Assignee:
              dlmarion Dave Marion
              Reporter:
              dlmarion Dave Marion
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 5h
                5h

                  Development