Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-12751

callAsync(jobs, rdc) performance degrades quadratically as jobs.size() grows

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.7.6
    • None
    • compute
    • None
    • Docs Required, Release Notes Required

    Description

      Please consider attached reproducer and linked report.

      {{compute.callAsync(jobs, reducer);
      Result [res=33, tookMs=81, jobs=5] //warm up
      Result [res=99, tookMs=21, jobs=15]
      Result [res=330, tookMs=22, jobs=50]
      Result [res=990, tookMs=57, jobs=150]
      Result [res=3300, tookMs=146, jobs=500]
      Result [res=9900, tookMs=231, jobs=1500]
      Result [res=33000, tookMs=840, jobs=5000]
      Result [res=99000, tookMs=6965, jobs=15000]
      Result [res=330000, tookMs=118394, jobs=50000]}}

      As soon jobs.size() grows past 5000, performance begins to degrade quadratically.

      I don't expect that it will be completely linear, but I would assume that it should stay linear-ish until size() hits at least 100000, given that we see clusters of 100 nodes and it's not unthinkable to expect 1000 jobs to be run on each node. 5000 jobs (which still give OK performance) / 100 nodes is just 50 jobs per node, which becomes limiting factor.

      Linked question also mentions OOM event, which may be caused of intermediate storage of (N^2) data.

      Attachments

        1. word-count-reproducer.zip
          3 kB
          Ilya Kasnacheev

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ilyak Ilya Kasnacheev
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: