Uploaded image for project: 'Traffic Server'
  1. Traffic Server
  2. TS-4897

Unbound growth of number of memory maps for traffic_server under SSL termination load when ssl_ticket_enabled=0

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • None
    • 7.1.0
    • TLS
    • None

    Description

      The number of [anon] memory regions mapped to the traffic_server process displays unbound growth until the kernel thresholds are reached and the process is terminated.

      This happens when ATS is used to terminate SSL and ssl_ticket_enabled=0 in ssl_multicert.config.

      We've experienced this issue on our staging and production hosts and were able to replicate it with the above configuration under high volume HTTPS load. We didn't experience this with 5.2.x and it will make sense why at the end.

      While generating https traffic with siege or ab, the issue can be observed with:
      watch "pmap $(pidof traffic_server) | wc -l"

      git bisect pointed us to: <TS-3883: Fix madvise>

      Turns out a no-op ats_madvise hides the symptoms of the issue.

      Going in deeper, we realize that ssl_ticket_enabled option is relevant because after enabling the ssl.session_cache tag, we see that ATS doesn't manage its own session cache for SSL, it is done by the library instead. In that case, the code path doing the problematic allocation within ATS doesn't get executed often since OpenSSL takes care of the session tokens.

      But why does this happen? It happens because MADV_DONTDUMP is passed to posix_madvise even though MADV_DONTDUMP is not a valid flag for posix_madvise as it is not a drop-in replacement to madvise.

      Looking at <bits/mman.h>:

           87 /* Advice to `madvise'.  */
           88 #ifdef __USE_BSD
           89 # define MADV_NORMAL▸     0▸    /* No further special treatment.  */
           90 # define MADV_RANDOM▸     1▸    /* Expect random page references.  */
           91 # define MADV_SEQUENTIAL  2▸    /* Expect sequential page references.  */
           92 # define MADV_WILLNEED▸   3▸    /* Will need these pages.  */
           93 # define MADV_DONTNEED▸   4▸    /* Don't need these pages.  */
           94 # define MADV_REMOVE▸     9▸    /* Remove these pages and resources.  */
           95 # define MADV_DONTFORK▸   10▸   /* Do not inherit across fork.  */
           96 # define MADV_DOFORK▸     11▸   /* Do inherit across fork.  */
           97 # define MADV_MERGEABLE▸  12▸   /* KSM may merge identical pages.  */
           98 # define MADV_UNMERGEABLE 13▸   /* KSM may not merge identical pages.  */
           99 # define MADV_DONTDUMP▸   16    /* Explicity exclude from the core dump,
          100                                    overrides the coredump filter bits.  */
          101 # define MADV_DODUMP▸     17▸   /* Clear the MADV_DONTDUMP flag.  */
          102 # define MADV_HWPOISON▸   100▸  /* Poison a page for testing.  */
          103 #endif
      

      However posix_madvise takes:

          107 # define POSIX_MADV_NORMAL▸     0 /* No further special treatment.  */
          108 # define POSIX_MADV_RANDOM▸     1 /* Expect random page references.  */
          109 # define POSIX_MADV_SEQUENTIAL▸ 2 /* Expect sequential page references.  */
          110 # define POSIX_MADV_WILLNEED▸   3 /* Will need these pages.  */
          111 # define POSIX_MADV_DONTNEED▸   4 /* Don't need these pages.  */
      

      Also posix_madvise and madvise can both be present on the same system. However they do not have the same capability. That's why Explicity exclude from the core dump, overrides the coredump filter bits functionality isn't achievable through posix_madvise.

      Will post a PR momentarily.

      Attachments

        Issue Links

          Activity

            People

              psudaemon Phil Sorber
              cselcik Can Selcik
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 40m
                  3h 40m