Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-17302

RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation.

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.5.0
    • rbf
    • Reviewed

    Description

      Current shortcomings

      HDFS-14090 provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as:

      1. Configuration is inconvenient and error-prone: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of handlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. In addition, when I reconfigure the total number of handlers on the router, I have to re-allocate handlers to each ns, which undoubtedly increases the complexity of operation and maintenance.

      2. Extension ns is not supported: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone.

      3. Waste handlers: The main purpose of proposing RouterRpcFairnessPolicyController is to enable the router to access ns with normal load and not be affected by ns with higher load. First of all, not all ns have high loads; secondly, ns with high loads do not have high loads 24 hours a day. It may be that only certain time periods, such as 0 to 8 o'clock, have high loads, and other time periods have normal loads. Assume there are 2 ns, and each ns is allocated half of the number of handlers. Assume that ns1 has many requests from 0 to 14 o'clock, and almost no requests from 14 to 24 o'clock, ns2 has many requests from 12 to 24 o'clock, and almost no requests from 0 to 14 o'clock; when it is between 0 o'clock and 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more requests and the other ns has almost no requests, so we have wasted half of the number of handlers.

      4. Only isolation, no sharing: The staticRouterRpcFairnessPolicyController does not support sharing, only isolation. I think isolation is just a means to improve the performance of router access to normal ns, not the purpose. It is impossible for all ns in the cluster to have high loads. On the contrary, in most scenarios, only a few ns in the cluster have high loads, and the loads of most other ns are normal. For ns with higher load and ns with normal load, we need to isolate their handlers so that the ns with higher load will not affect the performance of ns with lower load. However, for nameservices that are also under normal load, or are under higher load, we do not need to isolate them, these ns of the same nature can share the handlers of the router; The performance is better than assigning a fixed number of handlers to each ns, because each ns can use all the handlers of the router.

      New features

      Based on the above staticRouterRpcFairnessPolicyController, there are deficiencies in usage and performance. I provide a new RouterRpcFairnessPolicyController: ProportionRouterRpcFairnessPolicyController (maybe with a better name) to solve the above major shortcomings.

      1. More user-friendly configuration : Supports allocating handlers proportionally to each ns. For example, we can give ns1 a handler ratio of 0.2, then ns1 will use 0.2 of the total number of handlers on the router. Using this method, we do not need to confirm in advance how many handlers the router has.

      2. Sharing and isolation : Sharing is as important as isolation. We support that the sum of handlers for all ns exceeds the total number of handlers. For example, assuming we have 10 handlers and 3 ns, we can allocate 5 (0.5) handlers to ns1, 5 (0.5) handlers to ns2, and ns3 also allocates 5 (0.5) handlers.This feature is very important,.Consider the following scenarios:

      • Only one ns is busy during a period of time: Assume that ns1 has more requests from 0 to 8 o'clock, ns2 has more requests from 8 to 16 o'clock, and ns3 has more requests from 16 o'clock to 24 o'clock. Then, at any time period, the ns with more requests uses at most half of the handlers, and the other two normal ns share the remaining half of the handlers. In this way, the isolation is still satisfied, and compared with StaticRouterRpcFairnessPolicyController, we can use more handlers to handle requests of busy and Normal ns (if you use StaticRouterRpcFairnessPolicyController, each ns uses 3 handlers-[ns1:3 ns2:3 ns3:3], now we can let each ns use 5 handlers).
      • Only ns1 is busy: Assuming that only ns1 is busy at any time, the requests for ns2 and ns3 are normal (the requests to access ns2 and ns3 are very few and very fast because the downstream namenode has no pressure). We can give ns1 5(0.5) handlers, ns2 and ns3 both have 10(1) handlers. Since the number of requests for ns2 and ns3 is very small, and the request processing time is very short, it will not have a major impact on the performance of ns1, and we stipulate that ns1 uses at most half of the handlers, so the isolation is still met.

      3. Transparent extension: Expanding new ns does not require refreshing the configuration. For an ns, if we do not assign handlers to it, we can assign a certain proportion of handlers to it by default.

      4. Fully compatible: The new RouterRpcFairnessPolicyController fully meets the characteristics of StaticRouterRpcFairnessPolicyController. If we want to only support isolation but not sharing, we can allocate 0.3 to ns2、0.3 to ns3、0.4 to ns1. This is also more convenient than using the original StaticRouterRpcFairnessPolicyController, because we don't need to know how many handlers the router has in total.

      Therefore, the new RouterRpcFairnessPolicyController is more flexible, has better performance, and is more suitable for actual production environments.

      Attachments

        1. HDFS-17302.001.patch
          6 kB
          Jian Zhang
        2. HDFS-17302.002.patch
          17 kB
          Jian Zhang
        3. HDFS-17302.003.patch
          14 kB
          Jian Zhang

        Issue Links

          Activity

            People

              keepromise Jian Zhang
              keepromise Jian Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: