Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-1342

Mini-batch preprocessor for images - performance issue

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • v1.17
    • Module: Utilities
    • None

    Description

      Improve performance of mini-batch preprocessor for images. May involve writing a new matrix aggregation function to support multi-dimensional arrays.

      I have a 2 segment GP5 cluster set up:

      • preprocessing 50k training rows from CIFAR-10 fits into 3 buffers and takes ~1 hour (buffer size of 24415 is reported in the summary file) – i.e. used NULL buffer size
      • preprocessing 10k training rows from CIFAR-10 fits into 1 buffer and takes ~2 minutes

      More info:

      If I use `buffer_size=5000` it takes 979 sec
      If I use `buffer_size=500` it takes 75 sec

      So I think there is an issue with large buffer sizes

      Attachments

        Activity

          People

            Unassigned Unassigned
            fmcquillan Frank McQuillan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: