Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-2250

Curl NetAccessor mishandles larger data with NetAcc_InternalError

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.1.4, 3.2.1, 3.2.2, 3.2.3, 3.2.4, 3.2.5
    • 3.3.0
    • NetAccessors

    Description

      The Curl NetAccessor has a buffer overflow bug

      It can be easily replicated if the curl NetAccessor is turned on during configure:

      ./configure --with-icu --with-curl

      and then invoking the NetAccessorTest executable using one of the large files, I get:

      ~/xerces-c $ ./tests/NetAccessorTest file://$(pwd)/doc/program-dom.xml
      Exception during test:
          internal error in NetAccessor

      The problem is in CurlURLInputStream::writeCallback which returns a value less than the expected value the function should consume as cnt != totalConsume. According to https://curl.se/libcurl/c/CURLOPT_WRITEFUNCTION.html CURL_WRITEFUNC_ERROR will then be returned. CURL_WRITEFUNC_ERROR is not handled in the switch statement in CurlURLInputStream::readMore, hence XMLExcepts::NetAcc_InternalError is thrown.

       

      I can see that the logic error in CurlURLInputStream::writeCallback is down to the assumption that libcurl will call this callback just once before Xerces is able to clear the buffer in CurlURLInputStream::readBytes. Perhaps this is because the Curl docs linked to above is not clear that up to CURL_MAX_WRITE_SIZE bytes could be provided on each or multiple invocations of the callback. Xerces makes the assumption that libcurl would only provide CURL_MAX_WRITE_SIZE bytes for each call to curl_multi_info_read. This is not correct on close inspection of the Xerces code at https://github.com/curl/curl/blob/160f0233590d0a02422594104ae805e1aa08d3db/lib/cw-out.c#L218 where there is a loop that will call the callback multiple times. Each invocation of the callback could expect up to CURL_MAX_WRITE_SIZE bytes to be consumed. However, Xerces can only handle CURL_MAX_WRITE_SIZE in total for multiple invocations of the callback due to the buffer definition:

      XMLByte CurlURLInputStream::fBuffer[CURL_MAX_WRITE_SIZE];

       

      Regarding solutions, one solution would be...

      If the number of bytes to consume in CurlURLInputStream::writeCallback would exceed the size of CurlURLInputStream::fBuffer then we could return CURL_WRITEFUNC_PAUSE from the callback to defer consuming the bytes, see [ https://curl.se/libcurl/c/CURLOPT_WRITEFUNCTION.html|https://curl.se/libcurl/c/CURLOPT_WRITEFUNCTION.html].

      However, unnecessarily pausing the network transfer seems inefficient and an unnecessary performance hit, so probably best not to consider this.

       

      I think the better solution  would be to replace the fixed size buffer CurlURLInputStream::fBuffer with a dynamically sized buffer. Looking at the Xerces code base, I don't see any kind of queue container and the STL does not seem to be used. I was thinking of using the ValueVectorOf container with template type B say, where B is a struct buffer with a member XMLByte[CURL_MAX_SIZE_WRITE_SIZE]. A fixed container size of one would equate to the current implementation. The fix would be to get it to grow if needed. Probably just a max size of 2 or 3 would be used in reality. Thoughts?

      Attachments

        Activity

          People

            scantor Scott Cantor
            wsfulton William S Fulton
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: