Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-5748

Potential segfault in `link` and `send` when linking to a remote process

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.22.0, 0.23.0, 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.28.0
    • 0.28.3, 1.0.0
    • libprocess
    • Mesosphere Sprint 38
    • 2

    Description

      There is a race in the SocketManager, between a remote link and disconnection of the underlying socket.

      We potentially segfault here: https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1512

      *socket dereferences the shared pointer underpinning the Socket* object. However, the code above this line actually has ownership of the pointer:
      https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1494-L1499

      If the socket dies during the link, the ignore_recv_data may delete the Socket underneath link:
      https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1399-L1411


      The same race exists for send.

      This race was discovered while running a new test in repetition:
      https://reviews.apache.org/r/49175/

      On OSX, I hit the race consistently every 500-800 repetitions:

      3rdparty/libprocess/libprocess-tests --gtest_filter="ProcessRemoteLinkTest.RemoteLink"  --gtest_break_on_failure --gtest_repeat=1000
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kaysoky Joseph Wu
            kaysoky Joseph Wu
            Benjamin Mahler Benjamin Mahler
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Agile

                Completed Sprint:
                Mesosphere Sprint 38 ended 08/Jul/16
                View on Board

                Slack

                  Issue deployment