Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-14528

Hybris 1905 and Solr 7.7.2 CPU Performance issue

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 7.7.2
    • None
    • SolrCLI

    Description

      We write you from CEMEX Mexico, we use your solution through the HYBRIS E-Commerce from SAP, we have been with it for 3 years and we never had performance problems with it.

       

      But since the end of March of this year when we have migrated from version 6.3 of Hybris to 1905, the one that brings with it also the change in version in solr from 6.1.0 to 7.7.2. We have found that when Hybris performs solr tasks like modifying an index or full index, the CPU usage climbs and saturates, causing the server to crash.

       

      This was reported to the SAP people, who made us change the following configuration parameters without achieving significant changes on it:

      (/etc/default/solr.in.sh)

      SOLR_JAVA_MEM="-Xms8g -Xmx8g -XX:ConcGCThreads=2 -XX:ParallelGCThreads=2"
      GC_TUNE="-XX:+UseG1GC -XX:+UnlockExperimentalVMOptions -XX:G1MaxNewSizePercent=70 -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=250 -XX:+UseLargePages -XX:+AlwaysPreTouch"

      (solrconfig.xml)

              <indexConfig>

                      <lockType>${solr.lock.type:native}</lockType>

                      <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">

                              <int name="maxMergeCount">2</int>

                              <int name="maxThreadCount">1</int>

                      </mergeScheduler>

       

                      <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">

                              <int name="maxMergeAtOnce">10</int>

                              <int name="segmentsPerTier">20</int>

                      </mergePolicyFactory>

                      <ramBufferSizeMB>600</ramBufferSizeMB>

              </indexConfig>
       This configuration changes made the server crash less often but it also made the indexation times much longer with a sustained high CPU usage. It is important to restate that no changes have been performed on our code regarding how indexation processes run, and this used to work quite well in the older solr version (6.1). (Tests and performance metrics can be found in the attached document named:  SOLR TEST cliente pro SAP TUNNING - 12-05-2020.docx)
       

      On the other hand, they tell us that they see a significant change in this class and I quote

       

      "The methods that take most of the time are related to the Lucene70DocValuesConsumer class. You can find attached a PPT file with screenshots from Dynatrace and a stack trace from Solr.

       

      I inspected the source code of the file (https://github.com/apache/lucene-solr/blob/branch_7_7/lucene/core/src/java/org/apache/lucene/codecs/lucene70/Lucene70DocValuesConsumer.java)

      to see if it used any flags or configuration parameters that could be configured / tuned but that is not the case.

       

      This part of the Solr code is very different from the old one (Solr 6.1). I did not have enough time to trace all the method calls to reach a conclusion, but it is definitively doing

      things differently."

       

      And they ask us to raise a ticket with you to see if they can help us see that it could have changed so much that it brings us the consumption problems mentioned above.

       

      As it is the first time that we report a problem directly to you, we would like you to guide us in what we can pass on to you or how to take this problem to a prompt solution.

       

      We remain at your entire disposal (and immediately) for what you need for your analysis.

       

      Regards.

      Attachments

        Activity

          People

            Unassigned Unassigned
            andresgtzc Andrés Gutiérrez
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: