Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-556

Poor performance for Reader::readBytes can be easily improved

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.3.2
    • 1.3.3
    • c++
    • None
    • Linux

    Description

      The default implementation of Reader::readBytes on 1.3.2 reads bytes into the result vector one-byte-at-a-time. For large byte arrays (~500k or so), this is horrendously slow.
      The code can easily be changed to simply do:

      void readBytes(std::vector<uint8_t> &val) {
              int64_t size = readSize(); 
             val.resize(size);
             in_.readBytes(&val[0], size);
      }
      

      ..which will copy all the bytes in a single call.
      (note: it appears this function has been changed in the trunk, but it still copies byte-by-byte, so the optimization would still apply).

      In my testing of serializing/deserializing a message with a 500k byte field in it 1000 times, execution time dropped from from 30+sec to 0.2sec with this optimization.

      The same optimization can easily be applied to readFixed(uint8_t *val...) as well.

      Attachments

        1. AVRO-556.patch
          0.9 kB
          Scott Banachowski

        Activity

          People

            Unassigned Unassigned
            wrightd Dave Wright
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: