Uploaded image for project: 'Qpid Dispatch'
  1. Qpid Dispatch
  2. DISPATCH-2059

Support running router under rr during test execution

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.15.0
    • Backlog
    • Tests
    • None

    Description

      Dispatch has env variable QPID_DISPATCH_RUNNER which is (according to comment) intended to be used for running tests under valgrind. That is outdated comment, because the memory checking is currently solved in a different way, in RuntimeChecks.cmake. One tool that would make sense to use to wrap dispatch is rr, the record-replay debugger from Mozilla (https://rr-project.org/).

      I've previously tried rr with (very) limited success in DISPATCH-782.

      aconway considered it while working on DISPATCH-902 and used it on other issues.

      There has been an attempt https://issues.apache.org/jira/browse/DISPATCH-739?focusedCommentId=15983719&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15983719 to use rr which however did not survive in the mainline to the present day.

      I have two problems with rr:

      1. Dispatch system-tests send SIGTERM to the subprocess itself, which is rr. What is necessary is to kill its children instead. Killing rr causes abrupt termination of the recording. When I issue ^C to a rr record qdrouterd -c ... in the terminal, that signal goes correctly to the child. I am not sure what's happening there in the test, where the difference comes from. Explicitly killing only children in the system test does the right thing. Sadly doing that requires hacks, python's subprocess does not allow to query children easily. The os module has some ways; psutil is the easiest, but thats a 3rd party dependency.
      2. CLion debugger disconnects during replay when qdrouterd gets SIGTERM, but the router handles that signal and continues running (cleanup)

      One awesome feature of rr is that the recording can be replayed many times, backwards and forwards, and all memory addresses stay the same in the recording, on every replay. Meaning that one can use watch -l *0x0000000 breakpoints to watch specific places of memory, and use reverse-cont gdb command. (rr emulates the gdb UI, it's a wrapper over gdb, actually, if I understand correctly.)

      Chaos mode

      rr has a --chaos switch which tries to explore thread schedules as to reveal more crashes; that could be useful

      Attachments

        Activity

          People

            jdanek Jiri Daněk
            jdanek Jiri Daněk
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: