Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-974

Path - performance testing

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • v1.9.1
    • Module: Utilities
    • None

    Description

      Story

      As a developer, I want to do performance testing on the Path algorithm so that I can understand and communicate scale effects to users.

      The proposed matrix for the 1st set of tests is:

      1) overall data size, i.e., number of rows in data sets = 1M, 10M, 100M
      2) number of partitions = 1k, 10k, 100k
      3) number of matches per partition = 1k, 10k, 100k

      The proposed matrix for the 2nd set of tests is:

      4) match "thickness", i.e., number of rows in match = 1, 1k, 10k
      5) number of symbols = 5, 15, 25

      Acceptance

      1) Please plot performance curves. Do not need to run all permutations to keep the size of the test matrix reasonable.
      E.g., when plotting the effect of number of partitions (#2 above), can fix data size at 10M (say) and number of matches per partition to 1k (say).

      Other

      1) Can use attached data set as a baseline for duplication/fabrication.

      2) Another useful data set is at
      http://csr.lanl.gov/data/auth/

      Attachments

        1. Ecommerce data set for path test 3.csv
          2 kB
          Frank McQuillan
        2. Benchmarking Param Design Doc - PATH.pdf
          99 kB
          Xiaocheng Tang

        Activity

          People

            xctang Xiaocheng Tang
            fmcquillan Frank McQuillan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: