Uploaded image for project: 'Thrift'
  1. Thrift
  2. THRIFT-5231

Improve Haskell parsing performance

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.13.0
    • Fix Version/s: None
    • Component/s: Haskell - Library
    • Labels:
      None

      Description

      We are using Thrift for (de-)serializing some Kafka messages and noticed that already at low throughput (1000 messages / second) a lot of CPU is used.

       

      I did a small benchmark just parsing a single T_BINARY value and if I use `readVal` for that it takes ~3ms per iteration. If instead I directly run the attoparsec parser, it only takes ~ 300ns. This is a difference by 4 orders of magnitude! Some difference is reasonable as when using `readVal` some IO and shuffling around bytestrings is involved, but the difference looks huge.

       

      I strongly suspect the implementation of `runParser` is not optimal. Basically it runs the parser with 1 Byte, and until it succeeds it appends 1 byte and retries. This means that for a value of size 1024 bytes, we e.g. try to parse it 1023 times. This seems rather inefficient.

       

      I am not really sure how to best fix this. In principle, it makes sense to feed bigger chunks to attoparsec and store the left-overs somewhere for the next parse. However, if we store it in the transport or protocol we have to implement it for each transport/protocol. Maybe an API change is necessary?

        Attachments

        1. Main.hs
          1 kB
          Philipp Hausmann
        2. parse_benchmark.html
          211 kB
          Philipp Hausmann

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              phile314 Philipp Hausmann
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: