Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10428

getMinCompetitiveScore method in MaxScoreSumPropagator fails to converge leading to busy threads in infinite loop

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 9.1
    • None
    • New

    Description

      Customers complained about high CPU for Elasticsearch cluster in production. We noticed that few search requests were stuck for long time

      % curl -s localhost:9200/_cat/tasks?v                               
      indices:data/read/search[phase/query] AmMLzDQ4RrOJievRDeGFZw:569205  AmMLzDQ4RrOJievRDeGFZw:569204  direct    1645195007282 14:36:47  6.2h
      indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:502075  emjWc5bUTG6lgnCGLulq-Q:502074  direct    1645195037259 14:37:17  6.2h
      indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:583270  emjWc5bUTG6lgnCGLulq-Q:583269  direct    1645201316981 16:21:56  4.5h
      

      Flame graphs indicated that CPU time is mostly going into getMinCompetitiveScore method in MaxScoreSumPropagator. After doing some live JVM debugging found that org.apache.lucene.search.MaxScoreSumPropagator.scoreSumUpperBound method had around 4 million invocations every second

      Figured out the values of some parameters from live debugging:

      minScoreSum = 3.5541441
      minScore + sumOfOtherMaxScores (params[0] scoreSumUpperBound) = 3.554144322872162
      returnObj scoreSumUpperBound = 3.5541444
      Math.ulp(minScoreSum) = 2.3841858E-7
      

      Example code snippet:

      double sumOfOtherMaxScores = 3.554144322872162;
      double minScoreSum = 3.5541441;
      float minScore = (float) (minScoreSum - sumOfOtherMaxScores);
      while (scoreSumUpperBound(minScore + sumOfOtherMaxScores) > minScoreSum) {
          minScore -= Math.ulp(minScoreSum);
          System.out.printf("%.20f, %.100f\n", minScore, Math.ulp(minScoreSum));
      }
      

      Attachments

        1. Flame_graph.png
          711 kB
          Ankit Jain

        Issue Links

          Activity

            People

              Unassigned Unassigned
              akjain Ankit Jain
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5.5h
                  5.5h