Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DAFFODIL-2246

Basic performance test built into daffodil

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.5.0
    • None
    • Infrastructure, QA
    • None

    Description

      We need a performance test - very simple one that uses built-in non-restricted DFDL schemas that are small and easily maintained.

      This should be easily run as part of regression testing by developers as part of normal developer edit-compile-test workflow.

      The goal is to catch significant performance regressions earlier.

      This is not a substitute for more serious performance testing on a controlled platform using realistic customer-centric DFDL schemas. That is still needed, and should cover things like multi-threaded "throughput" tests.

      This is a quicker/simpler thing. Single thread.

      Thoughts:

      • Measure performance relative to Tak calls aka "Takeon" units. This makes the timings self-relative to the speed of the JVM so that different people with different speed systems have a chance of still getting somewhat consistent timings.
      • Isolate parsing and unparsing timings.
      • Avoid I/O - we should read from in-memory buffers, write to in-memory buffers, which should be small enough (maybe 1 Mbyte) to not introduce memory-allocation/memory-footprint artifacts.
      • Single threaded only.
      • Use message-streaming API calls to parse repeatedly to create modest-sized infoset objects.
      • Isolate basic parse to create a DFDL Infoset from InfosetOutputter overhead.
        Isolate basic unparse from a DFDL Infoset from InfosetInputter overhead.
      • Test performance of schema compilation also (e.g., save parser saving to a stream that just discards the data)
      • Maintain per-developer history - each developer will have a file or something on their development system which is updated with timings and baselines so that when running these perf tests, results are compared to the prior-results for that same developer on that same machine.
        • This also allows for computation of standard-deviation and Z-score which make performance results far easier to analyze - as one can flag performance variations which are out of the norm not in some absolute timing sense, but relative to the standard-deviation of timings for that same test. (E.g., Z-Score more than 1.05 - more than 5% slower for the test relative to that test's own typical performance)

      Once we have the framework, we will want to put perf-tests that isolate performance of specific features so as to focus attention when regressions are seen. E.g., a perf test may want to use say, lengthKind 'prefixed' exclusively. Another test may focus on delimited text data, another on non-byte-sized/aligned data.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mbeckerle Mike Beckerle
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: