Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-17346

Fair call queue is defeated by abusive service principals

    XMLWordPrintableJSON

Details

    Description

      daryn reported that the FCQ prioritizes based on the full kerberos principal (ie. "user/host@realm") rather than short name (ie. "user") to prevent service principals like the DNs and NMs being de-prioritized since service principals are expected to be well behaved. Notably the DNs contribute a significant but important load so the intent is not to de-prioritize all DNs because their sum total load is high relative to users.

      This has the unfortunate side effect of allowing misbehaving & non-critical service principals to abuse the FCQ. The gstorm/* principals are a prime example. Each server is spamming opens as fast as possible which ensures that none of the gstorm servers can be de-prioritized because each principal is a fraction of the total load from all principals.

      The secondary and more devasting problem is other abusive non-service principals cannot be effectively de-prioritized. The sum total of all gstorm load prevents other principals from surpassing the priority thresholds. Principals stay in the highest priority queues which allows the abusive principals to overflow the entire call queue for extended periods of time. Notably it prevents the FCQ from moderating the heavy create loads from p_gup @ DB which cause significant performance degradation.

      Prioritization should be based on short name with configurable exemptions for services like the DN/NM.

      daryn suggested a solution that we applied on our clusters.

      Attachments

        1. HADOOP-17346.branch-3.3.001.patch
          17 kB
          Ahmed Hussein
        2. HADOOP-17346.branch-3.2.001.patch
          16 kB
          Ahmed Hussein
        3. HADOOP-17346-branch-3.1.001.patch
          17 kB
          Ahmed Hussein

        Issue Links

          Activity

            People

              ahussein Ahmed Hussein
              ahussein Ahmed Hussein
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m