Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0.2
    • Fix Version/s: 3.1.3
    • Component/s: Core, Logging, Performance
    • Labels:
      None
    • Environment:

      Any

      Description

      The current Diags.h D/EClosure mechanism is obsolete. ATS requires gcc >= 4.1 for all compilation environments, and it includes variadic argument macro support with ##VA_ARGS that deletes the final comma if no arguments are provided.
      Removing the added layer should also improve performance when high volume debugging is turned on.

      1. ats.benchmark
        3 kB
        Uri Shachar
      2. diags_perf_and_remove_debugon.patch
        24 kB
        Uri Shachar
      3. diags_remove_debugon.patch
        24 kB
        Uri Shachar
      4. remove_prefix_arg_v3.patch
        21 kB
        Uri Shachar
      5. remove_prefix_arg_v2.patch
        20 kB
        Uri Shachar
      6. remove_prefix_arg.patch
        20 kB
        Uri Shachar
      7. diags_cleanup.patch
        11 kB
        Uri Shachar

        Issue Links

          Activity

          Hide
          Uri Shachar added a comment -

          closed.

          Show
          Uri Shachar added a comment - closed.
          Hide
          Leif Hedstrom added a comment -

          Hah! Thanks. Should we close this for now, or you have more goodies coming this way ?

          Show
          Leif Hedstrom added a comment - Hah! Thanks. Should we close this for now, or you have more goodies coming this way ?
          Hide
          Uri Shachar added a comment -

          The very least we could do is open a bug for it (and by we I mean I)
          TS-1121

          Show
          Uri Shachar added a comment - The very least we could do is open a bug for it (and by we I mean I) TS-1121
          Hide
          Leif Hedstrom added a comment -

          Uri, should we try to fix --disable-diags too? And with we, I mean you .

          I committed everything else, thanks for the fixes.

          Show
          Leif Hedstrom added a comment - Uri, should we try to fix --disable-diags too? And with we, I mean you . I committed everything else, thanks for the fixes.
          Hide
          Leif Hedstrom added a comment -

          Did you patch fix --disable-diags ? If not, I can do that.

          I tested the patch, looks good. I redid my benchmark a bit, to make sure I measure things properly. Here's the result from current trunk:

          163184902 fetches on 163211 conns, 390 max parallel, 1.631849E+08 bytes, in 900 seconds
          1 mean bytes/fetch
          181316.5 fetches/sec, 1.813165E+05 bytes/sec
          msecs/connect: 0.283 mean, 6.224 max, 0.045 min
          msecs/first-response: 1.520 mean, 500.103 max, 0.077 min
          

          This is with your latest patch:

          http_load  -parallel 130 -seconds 900 -keep_alive 1000 /tmp/TINY
          166093081 fetches on 166107 conns, 390 max parallel, 1.660930E+08 bytes, in 900 seconds
          1 mean bytes/fetch
          184547.8 fetches/sec, 1.845478E+05 bytes/sec
          msecs/connect: 0.425 mean, 341.253 max, 0.042 min
          msecs/first-response: 1.460 mean, 290.295 max, 0.075 min
          

          So, looks good to me. Let me check with pre-1102 as well, but I think this looks promising.

          Show
          Leif Hedstrom added a comment - Did you patch fix --disable-diags ? If not, I can do that. I tested the patch, looks good. I redid my benchmark a bit, to make sure I measure things properly. Here's the result from current trunk: 163184902 fetches on 163211 conns, 390 max parallel, 1.631849E+08 bytes, in 900 seconds 1 mean bytes/fetch 181316.5 fetches/sec, 1.813165E+05 bytes/sec msecs/connect: 0.283 mean, 6.224 max, 0.045 min msecs/first-response: 1.520 mean, 500.103 max, 0.077 min This is with your latest patch: http_load -parallel 130 -seconds 900 -keep_alive 1000 /tmp/TINY 166093081 fetches on 166107 conns, 390 max parallel, 1.660930E+08 bytes, in 900 seconds 1 mean bytes/fetch 184547.8 fetches/sec, 1.845478E+05 bytes/sec msecs/connect: 0.425 mean, 341.253 max, 0.042 min msecs/first-response: 1.460 mean, 290.295 max, 0.075 min So, looks good to me. Let me check with pre-1102 as well, but I think this looks promising.
          Hide
          Uri Shachar added a comment -

          First thing first: --disable-diags does not do anything (it sets TS_USE_DIAGS to 0 but Diags.h tests if is defined or not)....

          I haven't been able to reproduce anything close to a 15% degradation, but i have seen 1.7%. Looks like the increased overhead of always calling the multiple argument function outweighs everything else. When I do:
          -#define Diag(tag, fmt, ...) diags>log(tag, DTA(DL_Diag), fmt, ##VA_ARGS)
          +#define Diag(tag, fmt, ...) if (diags->on()) diags->log(tag, DTA(DL_Diag), fmt, ##VA_ARGS)

          We get the slight perf boost I expected when a tag is turned on, and no impact when diags is turned off.
          Attaching a patch (with the DebugOn removal) + the benchmark results that I got.

          Show
          Uri Shachar added a comment - First thing first: --disable-diags does not do anything (it sets TS_USE_DIAGS to 0 but Diags.h tests if is defined or not).... I haven't been able to reproduce anything close to a 15% degradation, but i have seen 1.7%. Looks like the increased overhead of always calling the multiple argument function outweighs everything else. When I do: -#define Diag(tag, fmt, ...) diags>log(tag, DTA(DL_Diag), fmt, ## VA_ARGS ) +#define Diag(tag, fmt, ...) if (diags->on()) diags->log(tag, DTA(DL_Diag), fmt, ## VA_ARGS ) We get the slight perf boost I expected when a tag is turned on, and no impact when diags is turned off. Attaching a patch (with the DebugOn removal) + the benchmark results that I got.
          Hide
          Uri Shachar added a comment -

          15% degradation is definitely unacceptable (I expected a slight improvement) – and "--disable-diags" should show no change at all... Do you test with any tags turned on?
          I'll do some benchmarks on my side and get back to you today.

          Show
          Uri Shachar added a comment - 15% degradation is definitely unacceptable (I expected a slight improvement) – and "--disable-diags" should show no change at all... Do you test with any tags turned on? I'll do some benchmarks on my side and get back to you today.
          Hide
          Leif Hedstrom added a comment -

          Yeah, I'm still seeing the performance problem :/. Doh.

          Show
          Leif Hedstrom added a comment - Yeah, I'm still seeing the performance problem :/. Doh.
          Hide
          Leif Hedstrom added a comment -

          Hmmm, now I'm having a difficult time reproducing this consistently... I'll poke around some more tomorrow, but if you have any ideas, let me know.

          Show
          Leif Hedstrom added a comment - Hmmm, now I'm having a difficult time reproducing this consistently... I'll poke around some more tomorrow, but if you have any ideas, let me know.
          Hide
          Leif Hedstrom added a comment -

          Hi Uri,

          I'm doing some benchmarks (I've tested both with and without --disable-diags), and from what I can tell, the 1102 commits makes the server about 15% slower... I'm surprised that compiling with --disable-diags doesn't work around that, but I'm 100% certain it's slower.

          At commit 9289d739ab779a24aeb947289eef06330319178b (one commit before 1102):

          5214181 fetches on 5402 conns, 390 max parallel, 5.214180E+06 bytes, in 30 seconds
          1 mean bytes/fetch
          173806.0 fetches/sec, 1.738060E+05 bytes/sec
          msecs/connect: 0.253 mean, 4.764 max, 0.050 min
          msecs/first-response: 1.717 mean, 347.950 max, 0.085 min
          

          With the TS-1102 commit, at 5ad005cb4bd16084615a3ade43271208dc6b374e

          4481768 fetches on 4675 conns, 390 max parallel, 4.481770E+06 bytes, in 30 seconds
          1 mean bytes/fetch
          149392.2 fetches/sec, 1.493922E+05 bytes/sec
          msecs/connect: 5.268 mean, 1002.667 max, 0.053 min
          msecs/first-response: 2.242 mean, 219.874 max, 0.091 min
          

          It's not a huge amount, but certainly a "regression". Uri, any thoughts?

          Show
          Leif Hedstrom added a comment - Hi Uri, I'm doing some benchmarks (I've tested both with and without --disable-diags), and from what I can tell, the 1102 commits makes the server about 15% slower... I'm surprised that compiling with --disable-diags doesn't work around that, but I'm 100% certain it's slower. At commit 9289d739ab779a24aeb947289eef06330319178b (one commit before 1102): 5214181 fetches on 5402 conns, 390 max parallel, 5.214180E+06 bytes, in 30 seconds 1 mean bytes/fetch 173806.0 fetches/sec, 1.738060E+05 bytes/sec msecs/connect: 0.253 mean, 4.764 max, 0.050 min msecs/first-response: 1.717 mean, 347.950 max, 0.085 min With the TS-1102 commit, at 5ad005cb4bd16084615a3ade43271208dc6b374e 4481768 fetches on 4675 conns, 390 max parallel, 4.481770E+06 bytes, in 30 seconds 1 mean bytes/fetch 149392.2 fetches/sec, 1.493922E+05 bytes/sec msecs/connect: 5.268 mean, 1002.667 max, 0.053 min msecs/first-response: 2.242 mean, 219.874 max, 0.091 min It's not a huge amount, but certainly a "regression". Uri, any thoughts?
          Hide
          Uri Shachar added a comment -

          Add a final patch to remove the redundant DebugOn macro – It doesn't save anything.

          Show
          Uri Shachar added a comment - Add a final patch to remove the redundant DebugOn macro – It doesn't save anything.
          Hide
          Uri Shachar added a comment -

          Took me ages to set up a solaris dev env, and then I saw your latest commit – it looks a lot better then the bunch of ifdefs I had going.
          I'll add another patch soon to remove the redundant DebugOn command, and then (finally) go back to the txn specific debugging functionality.

          Show
          Uri Shachar added a comment - Took me ages to set up a solaris dev env, and then I saw your latest commit – it looks a lot better then the bunch of ifdefs I had going. I'll add another patch soon to remove the redundant DebugOn command, and then (finally) go back to the txn specific debugging functionality.
          Hide
          Leif Hedstrom added a comment -

          We might need to back this out. Not only is it breaking Solaris builds, it actually segfaults the OSX gcc-llvm compiler (not our fault obviously, but still annoying). Uri, what do you think ?

          Show
          Leif Hedstrom added a comment - We might need to back this out. Not only is it breaking Solaris builds, it actually segfaults the OSX gcc-llvm compiler (not our fault obviously, but still annoying). Uri, what do you think ?
          Hide
          Leif Hedstrom added a comment -

          Yeah, all other build-bots are happy. And yes, icc is generally "gcc 4.x" compatible, and I think llvm tries to be as well. Solaris is well, Solaris, so no big surprise there ...

          I have a Solaris build box I can test it on as well, but, this is why we have the buildbots in the first place. I'll probably spend some time next to get both icc and llvm to compile again, and then we should aim to get buildbots for those as well.

          Thanks for taking the time to work on this, we really appreciate it, and of course, we welcome all new (and old committers to work with the project.

          Show
          Leif Hedstrom added a comment - Yeah, all other build-bots are happy. And yes, icc is generally "gcc 4.x" compatible, and I think llvm tries to be as well. Solaris is well, Solaris, so no big surprise there ... I have a Solaris build box I can test it on as well, but, this is why we have the buildbots in the first place. I'll probably spend some time next to get both icc and llvm to compile again, and then we should aim to get buildbots for those as well. Thanks for taking the time to work on this, we really appreciate it, and of course, we welcome all new (and old committers to work with the project.
          Hide
          Uri Shachar added a comment -

          clang/llvm claim to support this out of the box.
          Intel icc as well, though it might issue a warning about not being strict C99 compatible (depending on compiler version).

          Solaris has #_VA_ARGS_ which does exactly the same thing (Missing the additional # just to be different ), and might also require a "#pragma error_messages (off, E_ARGUEMENT_MISMATCH)".

          I'll generate a patch for solaris compiler, but I don't have access to a solaris machine to test it.
          Could you verify that the other build-bots pass ok before I get to work on this?

          Show
          Uri Shachar added a comment - clang/llvm claim to support this out of the box. Intel icc as well, though it might issue a warning about not being strict C99 compatible (depending on compiler version). Solaris has #_ VA_ARGS _ which does exactly the same thing (Missing the additional # just to be different ), and might also require a "#pragma error_messages (off, E_ARGUEMENT_MISMATCH)". I'll generate a patch for solaris compiler, but I don't have access to a solaris machine to test it. Could you verify that the other build-bots pass ok before I get to work on this?
          Hide
          Leif Hedstrom added a comment -

          Yeah, sorry. What it really means is that we don't support e.g gcc 3.x and earlier. We still want to support as many "modern" compilers as possible, eg Intel icc and clang/llvm.

          If you could, fixing this would be great.

          Show
          Leif Hedstrom added a comment - Yeah, sorry. What it really means is that we don't support e.g gcc 3.x and earlier. We still want to support as many "modern" compilers as possible, eg Intel icc and clang/llvm. If you could, fixing this would be great.
          Hide
          Uri Shachar added a comment -

          Could it be that we aren't compiling with gcc (Seems like it: -errof isn't a gcc flag)?
          The patch assumes gcc usage (which is what the build wiki page specifies), I can probably get it working on solaris as well with some ifdefs - but it won't look as good...

          Show
          Uri Shachar added a comment - Could it be that we aren't compiling with gcc (Seems like it: -errof isn't a gcc flag)? The patch assumes gcc usage (which is what the build wiki page specifies), I can probably get it working on solaris as well with some ifdefs - but it won't look as good...
          Hide
          Leif Hedstrom added a comment -

          Getting this on Solaris build-bot:

          CC -DHAVE_CONFIG_H -I. -I../../lib/ts   -I../../iocore/eventsystem -I../../iocore/net -I../../iocore/aio -I../../iocore/hostdb -I../../iocore/cache -I../../iocore/cluster -I../../iocore/utils -I../../iocore/dns  -I../../lib  -I../../lib/records  -I../../proxy  -I../../proxy/hdrs  -I../../mgmt  -I../../mgmt/preparse  -I../../mgmt/utils -I/export/home/buildbot/tcl/include/amd64 -I/export/home/buildbot/pcre/include/amd64 -D_LARGEFILE64_SOURCE=1 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_REENTRANT -Dsolaris -I/usr/local/ssl/include -I/opt/omni/include/amd64 -I/opt/expat-2.0.1/include -I/export/home/buildbot/pcre/include  -g -mt -m64 -D__WORDSIZE=64 -O3 -library=stlport4 -erroff -c -o Connection.o Connection.cc
          "P_NetAccept.h", line 75: Error: Trailing comma in a parameter list.
          "Connection.cc", line 250: Error: Trailing comma in a parameter list.
          "Connection.cc", line 370: Error: Trailing comma in a parameter list.
          3 Error(s) detected.
          *** Error code 2
          make: Fatal error: Command failed for target `Connection.o'
          Current working directory /export/home/buildbot/slave5/tserver-solaris-trunk/build/iocore/net
          

          I'll look at it tomorrow (but input welcome .

          Show
          Leif Hedstrom added a comment - Getting this on Solaris build-bot: CC -DHAVE_CONFIG_H -I. -I../../lib/ts -I../../iocore/eventsystem -I../../iocore/net -I../../iocore/aio -I../../iocore/hostdb -I../../iocore/cache -I../../iocore/cluster -I../../iocore/utils -I../../iocore/dns -I../../lib -I../../lib/records -I../../proxy -I../../proxy/hdrs -I../../mgmt -I../../mgmt/preparse -I../../mgmt/utils -I/export/home/buildbot/tcl/include/amd64 -I/export/home/buildbot/pcre/include/amd64 -D_LARGEFILE64_SOURCE=1 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_REENTRANT -Dsolaris -I/usr/local/ssl/include -I/opt/omni/include/amd64 -I/opt/expat-2.0.1/include -I/export/home/buildbot/pcre/include -g -mt -m64 -D__WORDSIZE=64 -O3 -library=stlport4 -erroff -c -o Connection.o Connection.cc "P_NetAccept.h" , line 75: Error: Trailing comma in a parameter list. "Connection.cc" , line 250: Error: Trailing comma in a parameter list. "Connection.cc" , line 370: Error: Trailing comma in a parameter list. 3 Error(s) detected. *** Error code 2 make: Fatal error: Command failed for target `Connection.o' Current working directory /export/home/buildbot/slave5/tserver-solaris-trunk/build/iocore/net I'll look at it tomorrow (but input welcome .
          Hide
          Leif Hedstrom added a comment -

          Thanks Uri!

          Show
          Leif Hedstrom added a comment - Thanks Uri!
          Hide
          Uri Shachar added a comment -

          Apologies for the confusion... regenerated patch with git. Hope everything is in order this time (First time playing with git).

          Show
          Uri Shachar added a comment - Apologies for the confusion... regenerated patch with git. Hope everything is in order this time (First time playing with git).
          Hide
          Uri Shachar added a comment -

          Sigh... you are right – SVN updates stopped more than a week ago...
          The commit for TS-1084 breaks my patch.

          Show
          Uri Shachar added a comment - Sigh... you are right – SVN updates stopped more than a week ago... The commit for TS-1084 breaks my patch.
          Hide
          Leif Hedstrom added a comment -

          I think the old SVN repo is "read-only" since the migration happened. The git repo is at

          http://git-wip-us.apache.org/repos/asf/trafficserver.git

          (or https:// if you want a read-write repo, the http:// is read-only).

          I'll look at the rejections in a bit, but if you have time, prepping one that patches against git trunk (master) would be great.

          Show
          Leif Hedstrom added a comment - I think the old SVN repo is "read-only" since the migration happened. The git repo is at http://git-wip-us.apache.org/repos/asf/trafficserver.git (or https:// if you want a read-write repo, the http:// is read-only). I'll look at the rejections in a bit, but if you have time, prepping one that patches against git trunk (master) would be great.
          Hide
          Uri Shachar added a comment -

          That could be the problem – I used svn – Isn't the svn repo synced/updated from the git repo?

          Show
          Uri Shachar added a comment - That could be the problem – I used svn – Isn't the svn repo synced/updated from the git repo?
          Hide
          Leif Hedstrom added a comment -

          I still get rejected hunks:

          loki (08:12) 363/0 $ patch -p0 < remove_prefix_arg_v2.patch 
          patching file lib/records/RecUtils.cc
          patching file lib/ts/Diags.cc
          patching file lib/ts/Diags.h
          Hunk #4 FAILED at 242.
          Hunk #5 FAILED at 358.
          Hunk #6 succeeded at 437 (offset 8 lines).
          2 out of 6 hunks FAILED -- saving rejects to file lib/ts/Diags.h.rej
          patching file proxy/Error.cc
          patching file proxy/InkAPI.cc
          patching file proxy/DiagsConfig.cc
          patching file mgmt/api/CoreAPI.cc
          patching file mgmt/utils/MgmtUtils.cc
          

          I'll look at it later tonight. Are you sure you're pulling the latest trunk (from git) ?

          Show
          Leif Hedstrom added a comment - I still get rejected hunks: loki (08:12) 363/0 $ patch -p0 < remove_prefix_arg_v2.patch patching file lib/records/RecUtils.cc patching file lib/ts/Diags.cc patching file lib/ts/Diags.h Hunk #4 FAILED at 242. Hunk #5 FAILED at 358. Hunk #6 succeeded at 437 (offset 8 lines). 2 out of 6 hunks FAILED -- saving rejects to file lib/ts/Diags.h.rej patching file proxy/Error.cc patching file proxy/InkAPI.cc patching file proxy/DiagsConfig.cc patching file mgmt/api/CoreAPI.cc patching file mgmt/utils/MgmtUtils.cc I'll look at it later tonight. Are you sure you're pulling the latest trunk (from git) ?
          Hide
          Uri Shachar added a comment -

          New patch created against the current trunk. Not sure why the previous patch was failing on your machine – it applied cleanly + regression passed on my setup.

          Show
          Uri Shachar added a comment - New patch created against the current trunk. Not sure why the previous patch was failing on your machine – it applied cleanly + regression passed on my setup.
          Hide
          Leif Hedstrom added a comment -

          Uri, the remove_prefix_arg.patch fails on current trunk. Any chance you can prepare a patch for trunk?

          Show
          Leif Hedstrom added a comment - Uri, the remove_prefix_arg.patch fails on current trunk. Any chance you can prepare a patch for trunk?
          Hide
          Uri Shachar added a comment -

          Added patch to remove the unused prefix argument (Includes the previous patch).

          Show
          Uri Shachar added a comment - Added patch to remove the unused prefix argument (Includes the previous patch).
          Hide
          Uri Shachar added a comment -

          Base cleanup patch. Will add 2 more patches to remove the DebugOn macro (which was an unoptimized version of Debug anyways) and remove the obsolete 'prefix' argument.

          Show
          Uri Shachar added a comment - Base cleanup patch. Will add 2 more patches to remove the DebugOn macro (which was an unoptimized version of Debug anyways) and remove the obsolete 'prefix' argument.
          Hide
          Uri Shachar added a comment -

          Cleanup before adding the new functionality

          Show
          Uri Shachar added a comment - Cleanup before adding the new functionality

            People

            • Assignee:
              Leif Hedstrom
              Reporter:
              Uri Shachar
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development