Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-556

Poor performance for Reader::readBytes can be easily improved

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.2
    • Fix Version/s: 1.3.3
    • Component/s: c++
    • Labels:
      None
    • Environment:

      Linux

      Description

      The default implementation of Reader::readBytes on 1.3.2 reads bytes into the result vector one-byte-at-a-time. For large byte arrays (~500k or so), this is horrendously slow.
      The code can easily be changed to simply do:

      void readBytes(std::vector<uint8_t> &val) {
              int64_t size = readSize(); 
             val.resize(size);
             in_.readBytes(&val[0], size);
      }
      

      ..which will copy all the bytes in a single call.
      (note: it appears this function has been changed in the trunk, but it still copies byte-by-byte, so the optimization would still apply).

      In my testing of serializing/deserializing a message with a 500k byte field in it 1000 times, execution time dropped from from 30+sec to 0.2sec with this optimization.

      The same optimization can easily be applied to readFixed(uint8_t *val...) as well.

        Attachments

        1. AVRO-556.patch
          0.9 kB
          Scott Banachowski

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              wrightd Dave Wright
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: