Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16328

intern() strings in DocCollection to reduce memory footprint

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 9.1
    • None
    • None

    Description

      strings used in a state,json are usually repeated and they are long lived. intern the strings to reduce the no:of objects

      Attachments

        Issue Links

          Activity

            githubbot ASF GitHub Bot logged work - 08/Aug/22 06:49
            • Time Spent:
              10m
               
              noblepaul opened a new pull request, #965:
              URL: https://github.com/apache/solr/pull/965

                 A lot of strings are repeated in `state.json` . interning those strings can dramatically reduce the heap used
                 
                 The heap used by the `DocCollection` Object for varying shard sizes is as follows
                 
                 ```
                 shards | w/o optimization(MB) | w/ optimization
                 512 1.36 1.06
                 1024 2.63 2.029
                 2048 5.21 3.98
                 4096 10.34 7.89
                 
                 ````


            githubbot ASF GitHub Bot logged work - 08/Aug/22 12:23
            githubbot ASF GitHub Bot logged work - 08/Aug/22 12:32
            githubbot ASF GitHub Bot logged work - 08/Aug/22 12:40
            • Time Spent:
              10m
               
              madrob commented on PR #965:
              URL: https://github.com/apache/solr/pull/965#issuecomment-1208076113

                 I suspect that we want to de-duplicate the keys but not the values? Do the values repeat across shards?
                 
                 Maybe we can identify a known set of duplicated strings and load them into a set.


            githubbot ASF GitHub Bot logged work - 08/Aug/22 16:21
            githubbot ASF GitHub Bot logged work - 09/Aug/22 00:14
            • Time Spent:
              10m
               
              noblepaul commented on PR #965:
              URL: https://github.com/apache/solr/pull/965#issuecomment-1208749089

                 
                 > Which version of Java were you testing this with?
                 
                 java 11
                 
                 > I suspect that we want to de-duplicate the keys but not the values? Do the values repeat across shards?
                 
                 `state.json` has most Strings repeated even values. Take a look at the sample below. Other than `base_url` and `range` almost all values can be safely interned
                 
                 ```
                 {"prs1":{
                   "pullReplicas":"0",
                   "replicationFactor":"1",
                   "router":{"name":"compositeId"},
                   "maxShardsPerNode":"2048",
                   "autoAddReplicas":"false",
                   "nrtReplicas":"1",
                   "tlogReplicas":"0",
                   "shards":{
                     "shard1":{
                       "range":"80000000-80ffffff",
                       "state":"active",
                       "replicas":{"core_node3":{
                         "core":"prs1_",
                         "node_name":"10.0.0.49:8986_solr",
                         "base_url":"http://10.0.0.49:8986/solr",
                         "state":"active",
                         "type":"NRT",
                         "force_set_state":"false",
                         "leader":"true"}}}}
                 ```
                 
                 > Was this measured using PRS or "traditional" state?
                 
                 I tested with both and they're not very different
                 
                 


            githubbot ASF GitHub Bot logged work - 18/Aug/22 14:22
            • Time Spent:
              10m
               
              madrob commented on PR #965:
              URL: https://github.com/apache/solr/pull/965#issuecomment-1219556363

                 You are correct, a lot of the values are repeated too, whether they are booleans or enums, a lot of them are basically part of a closed set. Still, I don't think JVM interning is the way to go, and I would rather see us implement a HashMap or CHM interner as described in the linked blog post.


            githubbot ASF GitHub Bot logged work - 24/Aug/22 14:59
            githubbot ASF GitHub Bot logged work - 26/Aug/22 04:43
            githubbot ASF GitHub Bot logged work - 30/Aug/22 13:43
            • Time Spent:
              10m
               
              madrob commented on code in PR #965:
              URL: https://github.com/apache/solr/pull/965#discussion_r958487650


              ##########
              solr/core/src/java/org/apache/solr/core/CoreContainer.java:
              ##########
              @@ -2424,6 +2430,32 @@ public PlacementPluginFactory<? extends PlacementPluginConfig> getPlacementPlugi
                 public void runAsync(Runnable r) {
                   coreContainerAsyncTaskExecutor.submit(r);
                 }
              +
              + public static void setWeakStringInterner() {
              + Interner<String> interner = Interner.newWeakInterner();
              + ClusterState.setStrInternerParser(
              + new Function<>() {

              Review Comment:
                 Let's move this function to `Utils` with the other ObjectBuilders.



              ##########
              solr/core/src/java/org/apache/solr/core/CoreContainer.java:
              ##########
              @@ -383,6 +388,7 @@ public CoreContainer(NodeConfig config, CoresLocator locator, boolean asyncSolrC
                   if (null != this.cfg.getBooleanQueryMaxClauseCount()) {
                     IndexSearcher.setMaxClauseCount(this.cfg.getBooleanQueryMaxClauseCount());
                   }
              + setWeakStringInterner();

              Review Comment:
                 Please add a property to enable/disable this feature. Default on is fine, but ability to easily switch off for cases where it introduces issues would be nice.



              ##########
              versions.props:
              ##########
              @@ -4,7 +4,7 @@ com.carrotsearch:hppc=0.9.1
               com.cybozu.labs:langdetect=1.1-20120112
               com.fasterxml.jackson:jackson-bom=2.13.3
               com.fasterxml.woodstox:woodstox-core=6.2.8
              -com.github.ben-manes.caffeine:caffeine=3.0.5

              Review Comment:
                 Also need to update versions.lock and the license files. `./gradlew helpDeps` has the commands you need.



              ##########
              solr/solrj/src/java/org/apache/solr/common/cloud/ClusterState.java:
              ##########
              @@ -477,4 +484,16 @@ public PerReplicaStates getStates() {
                     return perReplicaStates;
                   }
                 }
              +
              + private static volatile Function<JSONParser, ObjectBuilder> STR_INTERNER_OBJ_BUILDER =
              + STANDARDOBJBUILDER;
              +
              + public static void setStrInternerParser(Function<JSONParser, ObjectBuilder> fun) {
              + if (fun == null) return;
              + STR_INTERNER_OBJ_BUILDER = fun;
              + }
              +
              + public static Function<JSONParser, ObjectBuilder> getStringInterner() {

              Review Comment:
                 This appears unused.



              ##########
              solr/solrj/src/java/org/apache/solr/common/util/Utils.java:
              ##########
              @@ -303,6 +303,11 @@ public static Object fromJSON(byte[] utf8) {
                 }
               
                 public static Object fromJSON(byte[] utf8, int offset, int length) {
              + return fromJSON(utf8, offset, length, STANDARDOBJBUILDER);
              + }
              +
              + public static Object fromJSON(

              Review Comment:
                 This looks like it is only called with the customized object builder from `ClusterState.createFromJSON` which is only called from BackupManager. That feels like we're missing some usages.



            githubbot ASF GitHub Bot logged work - 31/Aug/22 05:22
            • Time Spent:
              10m
               
              noblepaul commented on code in PR #965:
              URL: https://github.com/apache/solr/pull/965#discussion_r959170422


              ##########
              solr/core/src/java/org/apache/solr/core/CoreContainer.java:
              ##########
              @@ -2424,6 +2430,32 @@ public PlacementPluginFactory<? extends PlacementPluginConfig> getPlacementPlugi
                 public void runAsync(Runnable r) {
                   coreContainerAsyncTaskExecutor.submit(r);
                 }
              +
              + public static void setWeakStringInterner() {
              + Interner<String> interner = Interner.newWeakInterner();
              + ClusterState.setStrInternerParser(
              + new Function<>() {

              Review Comment:
                 utils is in the solrj and solrj does not have visibility to caffeine classes



            githubbot ASF GitHub Bot logged work - 31/Aug/22 05:23
            • Time Spent:
              10m
               
              noblepaul commented on code in PR #965:
              URL: https://github.com/apache/solr/pull/965#discussion_r959170545


              ##########
              solr/core/src/java/org/apache/solr/core/CoreContainer.java:
              ##########
              @@ -383,6 +388,7 @@ public CoreContainer(NodeConfig config, CoresLocator locator, boolean asyncSolrC
                   if (null != this.cfg.getBooleanQueryMaxClauseCount()) {
                     IndexSearcher.setMaxClauseCount(this.cfg.getBooleanQueryMaxClauseCount());
                   }
              + setWeakStringInterner();

              Review Comment:
                 that's the plan.



            githubbot ASF GitHub Bot logged work - 31/Aug/22 11:55
            • Time Spent:
              10m
               
              madrob commented on code in PR #965:
              URL: https://github.com/apache/solr/pull/965#discussion_r959491853


              ##########
              solr/core/src/java/org/apache/solr/core/CoreContainer.java:
              ##########
              @@ -2424,6 +2430,34 @@ public PlacementPluginFactory<? extends PlacementPluginConfig> getPlacementPlugi
                 public void runAsync(Runnable r) {
                   coreContainerAsyncTaskExecutor.submit(r);
                 }
              +
              + public static void setWeakStringInterner() {
              + boolean enable = "true".equals(System.getProperty("use.str.intern", "true"));

              Review Comment:
                 This should be prefixed with solr.
                 
                 Maybe instead of true/false it would make sense to do strong/weak/off? No real opinion here, talking through ideas out loud to see how they sound... probably too much complexity and not worth it, but I'll leave that up to you.



            githubbot ASF GitHub Bot logged work - 05/Sep/22 12:59
            githubbot ASF GitHub Bot logged work - 06/Sep/22 05:58
            • Time Spent:
              10m
               
              chatman commented on PR #965:
              URL: https://github.com/apache/solr/pull/965#issuecomment-1237691267

                 Precommit passes for me locally.
                 
                 ```
                 ishan@x1extreme ~/code/fullstorydev-solr/solr (jira/SOLR-16328) $ ../gradlew check -x test
                 Downloading https://services.gradle.org/distributions/gradle-7.2-all.zip
                 ..............10%...............20%...............30%...............40%...............50%...............60%...............70%...............80%...............90%...............100%
                 To honour the JVM settings for this build a single-use Daemon process will be forked. See https://docs.gradle.org/7.2/userguide/gradle_daemon.html#sec:disabling_the_daemon.
                 Daemon will be stopped at the end of the build
                 
                 > Task :localSettings
                 
                 IMPORTANT. This is the first time you ran the build. I wrote some sane defaults (for this machine) to 'gradle.properties', they will be picked up on consecutive gradle invocations (not this one).
                 
                 Run gradlew :helpLocalSettings for more information.
                 
                 > Task :errorProneSkipped
                 WARNING: errorprone disabled (skipped on builds not running inside CI environments, pass -Pvalidation.errorprone=true to enable)
                 
                 > Task :solr:solrj:compileJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:solrj-zookeeper:compileJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:core:compileJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:test-framework:compileJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:modules:analysis-extras:compileJava
                 Note: /home/ishan/code/fullstorydev-solr/solr/modules/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java uses or overrides a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:modules:sql:compileJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:core:analyzeClassesDependencies
                 Dependency analysis found issues.
                 unusedDeclaredArtifacts
                  - org.apache.lucene:lucene-backward-codecs:9.3.0@jar
                 
                 
                 > Task :solr:core:compileTestJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:prometheus-exporter:compileJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:prometheus-exporter:compileTestJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:prometheus-exporter:analyzeTestClassesDependencies
                 Dependency analysis found issues.
                 unusedDeclaredArtifacts
                  - org.apache.lucene:lucene-test-framework:9.3.0@jar
                 
                 
                 > Task :solr:solr-ref-guide:compileTestJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:solr-ref-guide:downloadAntoraCli
                 
                 added 60 packages, changed 1 package, and audited 62 packages in 17s
                 
                 7 packages are looking for funding
                   run `npm fund` for details
                 
                 found 0 vulnerabilities
                 npm notice
                 npm notice New minor version of npm available! 8.11.0 -> 8.19.1
                 npm notice Changelog: <https://github.com/npm/cli/releases/tag/v8.19.1>
                 npm notice Run `npm install -g npm@8.19.1` to update!
                 npm notice
                 
                 > Task :solr:solr-ref-guide:downloadAntoraLunrExtension
                 
                 added 18 packages, changed 1 package, and audited 81 packages in 7s
                 
                 18 packages are looking for funding
                   run `npm fund` for details
                 
                 found 0 vulnerabilities
                 
                 > Task :solr:solr-ref-guide:downloadAntoraSiteGenerator
                 
                 added 153 packages, changed 1 package, and audited 235 packages in 23s
                 
                 31 packages are looking for funding
                   run `npm fund` for details
                 
                 9 high severity vulnerabilities
                 
                 To address issues that do not require attention, run:
                   npm audit fix
                 
                 To address all issues (including breaking changes), run:
                   npm audit fix --force
                 
                 Run `npm audit` for details.
                 
                 > Task :solr:solr-ref-guide:downloadAsciidoctorMathjaxExtension
                 
                 added 8 packages, changed 1 package, and audited 244 packages in 8s
                 
                 31 packages are looking for funding
                   run `npm fund` for details
                 
                 9 high severity vulnerabilities
                 
                 To address issues that do not require attention, run:
                   npm audit fix
                 
                 To address all issues (including breaking changes), run:
                   npm audit fix --force
                 
                 Run `npm audit` for details.
                 
                 > Task :solr:solr-ref-guide:buildLocalSite
                 The generated local ref-guide can be found at:
                         /home/ishan/code/fullstorydev-solr/solr/solr-ref-guide/build/site/index.html
                 
                 > Task :solr:solr-ref-guide:downloadLinkValidator
                 npm WARN deprecated formidable@1.2.6: Please upgrade to latest, formidable@v2 or formidable@v3! Check these notes: https://bit.ly/2ZEqIau
                 npm WARN deprecated superagent@3.8.3: Please upgrade to v7.0.2+ of superagent. We have fixed numerous issues with streams, form-data, attach(), filesystem errors not bubbling up (ENOENT on attach()), and all tests are now passing. See the releases tab for more information at <https://github.com/visionmedia/superagent/releases>.
                 
                 added 49 packages, changed 1 package, and audited 294 packages in 5s
                 
                 37 packages are looking for funding
                   run `npm fund` for details
                 
                 9 high severity vulnerabilities
                 
                 To address issues that do not require attention, run:
                   npm audit fix
                 
                 To address all issues (including breaking changes), run:
                   npm audit fix --force
                 
                 Run `npm audit` for details.
                 
                 > Task :solr:solr-ref-guide:checkSiteLinks
                 {
                   "stats": {
                     "errors": [],
                     "warnings": [],
                     "parsedFiles": 814,
                     "localLinks": 231,
                     "localAnchorLinks": 3053,
                     "parentLinks": 195,
                     "parentAnchorLinks": 1,
                     "remoteLinks": 0,
                     "remoteAnchorLinks": 0
                   }
                 }
                 
                 > Task :solr:solrj:compileTestJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:test-framework:compileTestJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:modules:analysis-extras:compileTestJava
                 Note: /home/ishan/code/fullstorydev-solr/solr/modules/analysis-extras/src/test/org/apache/solr/schema/TestICUCollationFieldUDVAS.java uses or overrides a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:modules:analytics:compileJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:modules:analytics:compileTestJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:modules:clustering:analyzeTestClassesDependencies
                 Dependency analysis found issues.
                 unusedDeclaredArtifacts
                  - org.apache.lucene:lucene-test-framework:9.3.0@jar
                 
                 
                 > Task :solr:modules:extraction:compileJava
                 Note: /home/ishan/code/fullstorydev-solr/solr/modules/extraction/src/java/org/apache/solr/handler/extraction/ExtractingDocumentLoader.java uses or overrides a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:modules:extraction:compileTestJava
                 Note: /home/ishan/code/fullstorydev-solr/solr/modules/extraction/src/test/org/apache/solr/handler/extraction/ExtractingRequestHandlerTest.java uses or overrides a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:modules:gcs-repository:compileTestJava
                 Note: /home/ishan/code/fullstorydev-solr/solr/modules/gcs-repository/src/test/org/apache/solr/gcs/ConcurrentDelegatingStorage.java uses or overrides a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:modules:hadoop-auth:compileJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:modules:hadoop-auth:compileTestJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:modules:hadoop-auth:analyzeTestClassesDependencies
                 Dependency analysis found issues.
                 unusedDeclaredArtifacts
                  - com.github.spotbugs:spotbugs-annotations:4.5.3@jar
                 
                 
                 > Task :solr:modules:hdfs:compileJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:modules:hdfs:compileTestJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:modules:jwt-auth:compileTestJava
                 Note: /home/ishan/code/fullstorydev-solr/solr/modules/jwt-auth/src/test/org/apache/solr/security/jwt/JWTAuthPluginIntegrationTest.java uses or overrides a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:modules:jwt-auth:analyzeTestClassesDependencies
                 Dependency analysis found issues.
                 unusedDeclaredArtifacts
                  - com.fasterxml.jackson.core:jackson-databind:2.13.3@jar
                 
                 
                 > Task :solr:modules:langid:compileJava
                 Note: Some input files use or override a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:modules:ltr:compileTestJava
                 Note: /home/ishan/code/fullstorydev-solr/solr/modules/ltr/src/test/org/apache/solr/ltr/model/TestMultipleAdditiveTreesModel.java uses or overrides a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 > Task :solr:modules:scripting:compileTestJava
                 Note: /home/ishan/code/fullstorydev-solr/solr/modules/scripting/src/test/org/apache/solr/scripting/update/TestBadScriptingUpdateProcessorConfig.java uses or overrides a deprecated API.
                 Note: Recompile with -Xlint:deprecation for details.
                 
                 Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.
                 
                 You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.
                 
                 See https://docs.gradle.org/7.2/userguide/command_line_interface.html#sec:command_line_warnings
                 
                 BUILD SUCCESSFUL in 13m 33s
                 463 actionable tasks: 463 executed
                 ishan@x1extreme ~/code/fullstorydev-solr/solr (jira/SOLR-16328) $ java -version
                 openjdk version "11.0.15" 2022-04-19
                 OpenJDK Runtime Environment 18.9 (build 11.0.15+10)
                 OpenJDK 64-Bit Server VM 18.9 (build 11.0.15+10, mixed mode, sharing)
                 ishan@x1extreme ~/code/fullstorydev-solr/solr (jira/SOLR-16328) $
                 ```


            githubbot ASF GitHub Bot logged work - 06/Sep/22 16:36
            githubbot ASF GitHub Bot logged work - 07/Sep/22 00:49
            • Time Spent:
              10m
               
              noblepaul commented on PR #965:
              URL: https://github.com/apache/solr/pull/965#issuecomment-1238787574

                 unfortunately we don't have a choice. We have to upgrade error-prone as well
                 
                 ```
                  > Could not resolve com.google.errorprone:error_prone_annotations:2.11.0.
                      Required by:
                          project :solr:modules:hdfs > project :solr:core
                          project :solr:modules:hdfs > project :solr:solrj
                          project :solr:modules:hdfs > project :solr:core > project :solr:solrj-zookeeper
                       > Cannot find a version of 'com.google.errorprone:error_prone_annotations' that satisfies the version constraints:
                            Dependency path 'org.apache.solr:hdfs:10.0.0-SNAPSHOT' --> 'com.github.ben-manes.caffeine:caffeine:3.1.1' (apiElements) --> 'com.google.errorprone:error_prone_annotations:2.14.0'
                            Constraint path 'org.apache.solr:hdfs:10.0.0-SNAPSHOT' --> 'org.apache:solr-root:10.0.0-SNAPSHOT' (gcvLocks) --> 'com.google.errorprone:error_prone_annotations:{strictly 2.11.0}' because of the following reason: Locked by versions.lock
                            Constraint path 'org.apache.solr:hdfs:10.0.0-SNAPSHOT' --> 'org.apache.solr:solrj:10.0.0-SNAPSHOT' (apiElements) --> 'com.google.errorprone:error_prone_annotations:2.11.0' because of the following reason: Computed
                            
                 ```


            githubbot ASF GitHub Bot logged work - 07/Sep/22 11:39

            People

              noble.paul Noble Paul
              noble.paul Noble Paul
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h 50m
                  4h 50m