Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-14524

Harden MultiThreadedOCPTest

    XMLWordPrintableJSON

    Details

    • Type: Test
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: master (9.0)
    • Fix Version/s: master (9.0)
    • Component/s: SolrCloud
    • Labels:

      Description

      MultiThreadedOCPTest.test() fails occasionally in Jenkins because of timing of tasks enqueue to the Collection API queue.

      This test in testFillWorkQueue() enqueues a large number of tasks (115, more than the 100 Collection API parallel executors) to the Collection API queue for a collection COLL_A, then observes a short delay and enqueues a task for another collection COLL_B.
      It verifies that the COLL_B task (that does not require the same lock as the COLL_A tasks) completes before the third COLL_A task.

      Test failures happen because when enqueues are slowed down enough, the first 3 tasks on COLL_A complete even before the COLL_B task gets enqueued!

      In one sample failed Jenkins test execution, the COLL_B task enqueue happened 1275ms after the enqueue of the first COLL_A, leaving plenty of time for a few (and possibly all) COLL_A tasks to complete.

      Fix will be along the lines of:

      • Make the “blocking” COLL_A task longer to execute (currently 1 second) to compensate for slow enqueues.
      • Verify the COLL_B task (a 1ms task) finishes before the long running COLL_A task does. This would be a good indication that even though the collection queue was filled with tasks waiting for a busy lock, a non competing task was picked and executed right away.
      • Delay the enqueue of the COLL_B task to the end of processing of the first COLL_A task. This would guarantee that COLL_B is enqueued once at least some COLL_A tasks started processing at the Overseer. Possibly also verify that the long running task of COLL_A didn't finish execution yet when the COLL_B task is enqueued...
      • It might be possible to set a (very) long duration for the slow task of COLL_A (to be less vulnerable to execution delays) without requiring the test to wait for that task to complete, but only wait for the COLL_B task to complete (so the test doesn't run for too long).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mdrob Mike Drob
                Reporter:
                ilan Ilan Ginzburg
              • Votes:
                1 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m