Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-4069

libevent_ssl_socket assertion fails

    Details

    • Target Version/s:
    • Epic Link:
    • Sprint:
      Mesosphere Sprint 25
    • Story Points:
      8

      Description

      Have been seeing the following socket receive error frequently:

      F1204 11:12:47.301839 54104 libevent_ssl_socket.cpp:245] Check failed: length > 0 
      *** Check failure stack trace: ***
          @     0x7f73227fe5a6  google::LogMessage::Fail()
          @     0x7f73227fe4f2  google::LogMessage::SendToLog()
          @     0x7f73227fdef4  google::LogMessage::Flush()
          @     0x7f7322800e08  google::LogMessageFatal::~LogMessageFatal()
          @     0x7f73227b93e2  process::network::LibeventSSLSocketImpl::recv_callback()
          @     0x7f73227b9182  process::network::LibeventSSLSocketImpl::recv_callback()
          @     0x7f731cbc75cc  bufferevent_run_deferred_callbacks_locked
          @     0x7f731cbbdc5d  event_base_loop
          @     0x7f73227d9ded  process::EventLoop::run()
          @     0x7f73227a3101  _ZNSt12_Bind_simpleIFPFvvEvEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
          @     0x7f73227a305b  std::_Bind_simple<>::operator()()
          @     0x7f73227a2ff4  std::thread::_Impl<>::_M_run()
          @     0x7f731e0d1a40  (unknown)
          @     0x7f731de0a182  start_thread
          @     0x7f731db3730d  (unknown)
          @              (nil)  (unknown)
      
      

      In this case this was a HTTP get over SSL. The url being:

      https://dseasb33srnrn.cloudfront.net:443/registry-v2/docker/registry/v2/blobs/sha256/44/44be94a95984bb47dc3a193f59bf8c04d5e877160b745b119278f38753a6f58f/data?Expires=1449259252&Signature=Q4CQdr1LbxsiYyVebmetrx~lqDgQfHVkGxpbMM3PoISn6r07DXIzBX6~tl1iZx9uXdfr~5awH8Kxwh-y8b0dTV3mLTZAVlneZlHbhBAX9qbYMd180-QvUvrFezwOlSmX4B3idvo-zK0CarUu3Ev1hbJz5y3olwe2ZC~RXHEwzkQ_&Key-Pair-Id=APKAJECH5M7VWIS5YZ6Q

      Steps to reproduce:

      1. Run master
      2. Run slave from your build directory as as:

       GLOG_v=1;SSL_ENABLED=1;SSL_KEY_FILE=<path_to_key>;SSL_CERT_FILE=<path_to_cert>;sudo -E ./bin/mesos-slave.sh \
            --master=127.0.0.1:5050 \                                                  
            --executor_registration_timeout=5mins \                                    
            --containerizers=mesos  \                                                  
            --isolation=filesystem/linux \                                             
            --image_providers=DOCKER  \                                                
            --docker_puller_timeout=600 \                                              
            --launcher_dir=$MESOS_BUILD_DIR/src/.libs \                                
            --switch_user="false" \                                                    
            --docker_puller="registry"          
      

      3. Run mesos-execute from your build directory as :

                                                              
          ./src/mesos-execute \                                                        
          --master=127.0.0.1:5050 \                                                    
          --command="uname -a"  \                                                      
          --name=test \                                                                
          --docker_image=ubuntu 
      

        Activity

        Hide
        jvanremoortere Joris Van Remoortere added a comment -
        commit 0461aaba695a33ef8e92f3818e217dd23926cb50
        Author: Jojy Varghese <jojy@mesosphere.io>
        Date:   Tue Dec 22 15:09:46 2015 -0500
        
            Libevent SSL: Added check for buffer length before swapping request.
            
            recv_callback could be called from libevent's receive callback and
            Socket::recv for the same buffer event and different requests. There
            is a check for buffer length at Socket::recv but not at libevent's
            receive callback. This could lead to the incoming request for
            Socket::recv being swapped out even though the buffer length is zero.
            This change adds a check for buffer length before swapping out the
            receive request object.
            
            Review: https://reviews.apache.org/r/41026/
        
        Show
        jvanremoortere Joris Van Remoortere added a comment - commit 0461aaba695a33ef8e92f3818e217dd23926cb50 Author: Jojy Varghese <jojy@mesosphere.io> Date: Tue Dec 22 15:09:46 2015 -0500 Libevent SSL: Added check for buffer length before swapping request. recv_callback could be called from libevent's receive callback and Socket::recv for the same buffer event and different requests. There is a check for buffer length at Socket::recv but not at libevent's receive callback. This could lead to the incoming request for Socket::recv being swapped out even though the buffer length is zero. This change adds a check for buffer length before swapping out the receive request object. Review: https: //reviews.apache.org/r/41026/
        Show
        jojy Jojy Varghese added a comment - https://reviews.apache.org/r/41026/
        Hide
        jojy Jojy Varghese added a comment -

        Adding tcpdump analysis:

        Tcp dump file: https://drive.google.com/file/d/0B-aoVvwDtYZNWGdWbVFZdUN3R0k/view?usp=sharing

        • Trace shows that the slave socket at 192.168.87.237:39287 sent a RST back for a long running streaming download (see captured frame #16959) most likely due to the assertion described in the issue.
        • The frames received does not show a 0 length received.

        Do we understand all the circumstance in which bufferevent_read will return a 0?

        Show
        jojy Jojy Varghese added a comment - Adding tcpdump analysis: Tcp dump file: https://drive.google.com/file/d/0B-aoVvwDtYZNWGdWbVFZdUN3R0k/view?usp=sharing Trace shows that the slave socket at 192.168.87.237:39287 sent a RST back for a long running streaming download (see captured frame #16959) most likely due to the assertion described in the issue. The frames received does not show a 0 length received. Do we understand all the circumstance in which bufferevent_read will return a 0?

          People

          • Assignee:
            jojy Jojy Varghese
            Reporter:
            jojy Jojy Varghese
            Shepherd:
            Joris Van Remoortere
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development

                Agile