Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3140 Support multiple network interfaces
  3. HDFS-3147

The Namenode should be able to filter DN interfaces given to clients

    Details

    • Type: Sub-task Sub-task
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: hdfs-client
    • Labels:
      None

      Description

      Not all DN interfaces exposed to clients from the NN should be used, an interface may not be routable by the client, or a user may want to restrict off-cluster clients from using cluster-private interfaces. Therefore the user should be able to make sure clients may be given only a subset of the addresses reported by workers. This can be accomplished by having masters filter the set of interfaces provided to clients, and/or having clients filter the interfaces they're given. The former is preferable because the configuration resides in a single place (the master instead of clients) and client configuration is less portable (the configuration from an off-cluster client might end up getting used on-cluser if passed as part of a job). In order to specify what interfaces clients receive the master is configured with a table with rules that map a given source address range (of the incoming connection) to a list of address ranges to used to filtering interfaces. An interface is given to the client only if it matches one of the address ranges (for the given source address it came in on). The rule has form: Range -> list <Range> where a range is specified in CIDR notation. If a source address matches multiple entries in the table only the first rule that matches is applied. If the table is empty or there are no matches then all interfaces are given to the client.

        Activity

        Eli Collins created issue -
        Eli Collins made changes -
        Field Original Value New Value
        Description HDFS-3146 exposes multiple interfaces to the client. However, not all interfaces exposed to clients should be used, eg because not all addresses given to clients may be routable by the client, or a user may want to restrict off-cluster clients from using cluster-private interfaces. Therefore the user should be able to configure clients to use a subset of the addresses they are given. This can be accomplished by a new configuration option ({{dfs.client.available.interfaces}}) that takes a list of interfaces to use, interfaces that don't match the configuration are ignored. Acceptable configuration values are the same as the {{dfs.datanode.available.interfaces}} parameter. In addition, we could also add an option where clients automatically check if they can connect to each interface that's given them, and filter those out by default. HDFS-3146 exposes multiple Datanode interfaces to the client. However, not all interfaces exposed to clients should be used, eg because not all addresses given to clients may be routable by the client, or a user may want to restrict off-cluster clients from using cluster-private interfaces. Therefore the user should be able to configure clients to use a subset of the addresses they are given. This can be accomplished by a new configuration option ({{dfs.client.available.interfaces}}) that takes a list of interfaces to use, interfaces that don't match the configuration are ignored. Acceptable configuration values are the same as the {{dfs.datanode.available.interfaces}} parameter. In addition, we could also add an option where clients automatically check if they can connect to each interface that's given them, and filter those out by default.
        Component/s hdfs client [ 12312928 ]
        Component/s data-node [ 12312927 ]
        Eli Collins made changes -
        Link This issue depends on HADOOP-8210 [ HADOOP-8210 ]
        Eli Collins made changes -
        Link This issue depends on HADOOP-8210 [ HADOOP-8210 ]
        Eli Collins made changes -
        Summary The client should be able to specify which network interfaces to use The Namenode should be able to filter DN interfaces given to clients
        Target Version/s 0.23.3 [ 12320052 ] 2.0.0 [ 12320353 ]
        Description HDFS-3146 exposes multiple Datanode interfaces to the client. However, not all interfaces exposed to clients should be used, eg because not all addresses given to clients may be routable by the client, or a user may want to restrict off-cluster clients from using cluster-private interfaces. Therefore the user should be able to configure clients to use a subset of the addresses they are given. This can be accomplished by a new configuration option ({{dfs.client.available.interfaces}}) that takes a list of interfaces to use, interfaces that don't match the configuration are ignored. Acceptable configuration values are the same as the {{dfs.datanode.available.interfaces}} parameter. In addition, we could also add an option where clients automatically check if they can connect to each interface that's given them, and filter those out by default. Not all DN interfaces exposed to clients from the NN should be used, an interface may not be routable by the client, or a user may want to restrict off-cluster clients from using cluster-private interfaces. Therefore the user should be able to make sure clients may be given only a subset of the addresses reported by workers. This can be accomplished by having masters filter the set of interfaces provided to clients, and/or having clients filter the interfaces they're given. The former is preferable because the configuration resides in a single place (the master instead of clients) and client configuration is less portable (the configuration from an off-cluster client might end up getting used on-cluser if passed as part of a job). In order to specify what interfaces clients receive the master is configured with a table with rules that map a given source address range (of the incoming connection) to a list of address ranges to used to filtering interfaces. An interface is given to the client only if it matches one of the address ranges (for the given source address it came in on). The rule has form: Range -> list <Range> where a range is specified in CIDR notation. If a source address matches multiple entries in the table only the first rule that matches is applied. If the table is empty or there are no matches then all interfaces are given to the client.
        Eli Collins made changes -
        Target Version/s 2.0.0 [ 12320353 ] Multiple interfaces (HDFS-3140) [ 12320556 ]
        Eli Collins made changes -
        Assignee Eli Collins [ eli2 ] Eli Collins [ eli ]

          People

          • Assignee:
            Eli Collins
            Reporter:
            Eli Collins
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development