Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
strings used in a state,json are usually repeated and they are long lived. intern the strings to reduce the no:of objects
Attachments
Issue Links
- links to
1.
|
Update Caffeine from 3.0.5 to 3.1.1 | Closed | Noble Paul |
|
Activity
ASF GitHub Bot
logged work - 08/Aug/22 06:49
-
- Time Spent:
- 10m
-
noblepaul opened a new pull request, #965:
URL: https://github.com/apache/solr/pull/965
A lot of strings are repeated in `state.json` . interning those strings can dramatically reduce the heap used
The heap used by the `DocCollection` Object for varying shard sizes is as follows
```
shards | w/o optimization(MB) | w/ optimization
512 1.36 1.06
1024 2.63 2.029
2048 5.21 3.98
4096 10.34 7.89
````
ASF GitHub Bot
logged work - 08/Aug/22 12:23
-
- Time Spent:
- 10m
-
madrob commented on PR #965:
URL: https://github.com/apache/solr/pull/965#issuecomment-1208058401
Which version of Java were you testing this with?
ASF GitHub Bot
logged work - 08/Aug/22 12:32
-
- Time Spent:
- 10m
-
madrob commented on PR #965:
URL: https://github.com/apache/solr/pull/965#issuecomment-1208067972
I'm likely -1 on this patch based solely on the venerable Aleksey Shipilev post https://shipilev.net/jvm/anatomy-quarks/10-string-intern/
ASF GitHub Bot
logged work - 08/Aug/22 12:40
-
- Time Spent:
- 10m
-
madrob commented on PR #965:
URL: https://github.com/apache/solr/pull/965#issuecomment-1208076113
I suspect that we want to de-duplicate the keys but not the values? Do the values repeat across shards?
Maybe we can identify a known set of duplicated strings and load them into a set.
ASF GitHub Bot
logged work - 08/Aug/22 16:21
-
- Time Spent:
- 10m
-
madrob commented on PR #965:
URL: https://github.com/apache/solr/pull/965#issuecomment-1208335912
Was this measured using PRS or "traditional" state?
ASF GitHub Bot
logged work - 09/Aug/22 00:14
-
- Time Spent:
- 10m
-
noblepaul commented on PR #965:
URL: https://github.com/apache/solr/pull/965#issuecomment-1208749089
> Which version of Java were you testing this with?
java 11
> I suspect that we want to de-duplicate the keys but not the values? Do the values repeat across shards?
`state.json` has most Strings repeated even values. Take a look at the sample below. Other than `base_url` and `range` almost all values can be safely interned
```
{"prs1":{
"pullReplicas":"0",
"replicationFactor":"1",
"router":{"name":"compositeId"},
"maxShardsPerNode":"2048",
"autoAddReplicas":"false",
"nrtReplicas":"1",
"tlogReplicas":"0",
"shards":{
"shard1":{
"range":"80000000-80ffffff",
"state":"active",
"replicas":{"core_node3":{
"core":"prs1_",
"node_name":"10.0.0.49:8986_solr",
"base_url":"http://10.0.0.49:8986/solr",
"state":"active",
"type":"NRT",
"force_set_state":"false",
"leader":"true"}}}}
```
> Was this measured using PRS or "traditional" state?
I tested with both and they're not very different
ASF GitHub Bot
logged work - 18/Aug/22 14:22
-
- Time Spent:
- 10m
-
madrob commented on PR #965:
URL: https://github.com/apache/solr/pull/965#issuecomment-1219556363
You are correct, a lot of the values are repeated too, whether they are booleans or enums, a lot of them are basically part of a closed set. Still, I don't think JVM interning is the way to go, and I would rather see us implement a HashMap or CHM interner as described in the linked blog post.
ASF GitHub Bot
logged work - 24/Aug/22 14:59
-
- Time Spent:
- 10m
-
madrob commented on PR #965:
URL: https://github.com/apache/solr/pull/965#issuecomment-1225844464
Can you try running with the JVM option `-XX:+UseStringDeduplication` - that might handle this issue for you without the need for code change.
ASF GitHub Bot
logged work - 26/Aug/22 04:43
-
- Time Spent:
- 10m
-
noblepaul commented on PR #965:
URL: https://github.com/apache/solr/pull/965#issuecomment-1228047371
@madrob I'm testing with a modified code that uses a weakmap interner and the results are better
ASF GitHub Bot
logged work - 30/Aug/22 13:43
-
- Time Spent:
- 10m
-
madrob commented on code in PR #965:
URL: https://github.com/apache/solr/pull/965#discussion_r958487650
##########
solr/core/src/java/org/apache/solr/core/CoreContainer.java:
##########
@@ -2424,6 +2430,32 @@ public PlacementPluginFactory<? extends PlacementPluginConfig> getPlacementPlugi
public void runAsync(Runnable r) {
coreContainerAsyncTaskExecutor.submit(r);
}
+
+ public static void setWeakStringInterner() {
+ Interner<String> interner = Interner.newWeakInterner();
+ ClusterState.setStrInternerParser(
+ new Function<>() {
Review Comment:
Let's move this function to `Utils` with the other ObjectBuilders.
##########
solr/core/src/java/org/apache/solr/core/CoreContainer.java:
##########
@@ -383,6 +388,7 @@ public CoreContainer(NodeConfig config, CoresLocator locator, boolean asyncSolrC
if (null != this.cfg.getBooleanQueryMaxClauseCount()) {
IndexSearcher.setMaxClauseCount(this.cfg.getBooleanQueryMaxClauseCount());
}
+ setWeakStringInterner();
Review Comment:
Please add a property to enable/disable this feature. Default on is fine, but ability to easily switch off for cases where it introduces issues would be nice.
##########
versions.props:
##########
@@ -4,7 +4,7 @@ com.carrotsearch:hppc=0.9.1
com.cybozu.labs:langdetect=1.1-20120112
com.fasterxml.jackson:jackson-bom=2.13.3
com.fasterxml.woodstox:woodstox-core=6.2.8
-com.github.ben-manes.caffeine:caffeine=3.0.5
Review Comment:
Also need to update versions.lock and the license files. `./gradlew helpDeps` has the commands you need.
##########
solr/solrj/src/java/org/apache/solr/common/cloud/ClusterState.java:
##########
@@ -477,4 +484,16 @@ public PerReplicaStates getStates() {
return perReplicaStates;
}
}
+
+ private static volatile Function<JSONParser, ObjectBuilder> STR_INTERNER_OBJ_BUILDER =
+ STANDARDOBJBUILDER;
+
+ public static void setStrInternerParser(Function<JSONParser, ObjectBuilder> fun) {
+ if (fun == null) return;
+ STR_INTERNER_OBJ_BUILDER = fun;
+ }
+
+ public static Function<JSONParser, ObjectBuilder> getStringInterner() {
Review Comment:
This appears unused.
##########
solr/solrj/src/java/org/apache/solr/common/util/Utils.java:
##########
@@ -303,6 +303,11 @@ public static Object fromJSON(byte[] utf8) {
}
public static Object fromJSON(byte[] utf8, int offset, int length) {
+ return fromJSON(utf8, offset, length, STANDARDOBJBUILDER);
+ }
+
+ public static Object fromJSON(
Review Comment:
This looks like it is only called with the customized object builder from `ClusterState.createFromJSON` which is only called from BackupManager. That feels like we're missing some usages.
ASF GitHub Bot
logged work - 31/Aug/22 05:22
-
- Time Spent:
- 10m
-
noblepaul commented on code in PR #965:
URL: https://github.com/apache/solr/pull/965#discussion_r959170422
##########
solr/core/src/java/org/apache/solr/core/CoreContainer.java:
##########
@@ -2424,6 +2430,32 @@ public PlacementPluginFactory<? extends PlacementPluginConfig> getPlacementPlugi
public void runAsync(Runnable r) {
coreContainerAsyncTaskExecutor.submit(r);
}
+
+ public static void setWeakStringInterner() {
+ Interner<String> interner = Interner.newWeakInterner();
+ ClusterState.setStrInternerParser(
+ new Function<>() {
Review Comment:
utils is in the solrj and solrj does not have visibility to caffeine classes
ASF GitHub Bot
logged work - 31/Aug/22 05:23
-
- Time Spent:
- 10m
-
noblepaul commented on code in PR #965:
URL: https://github.com/apache/solr/pull/965#discussion_r959170545
##########
solr/core/src/java/org/apache/solr/core/CoreContainer.java:
##########
@@ -383,6 +388,7 @@ public CoreContainer(NodeConfig config, CoresLocator locator, boolean asyncSolrC
if (null != this.cfg.getBooleanQueryMaxClauseCount()) {
IndexSearcher.setMaxClauseCount(this.cfg.getBooleanQueryMaxClauseCount());
}
+ setWeakStringInterner();
Review Comment:
that's the plan.
ASF GitHub Bot
logged work - 31/Aug/22 11:55
-
- Time Spent:
- 10m
-
madrob commented on code in PR #965:
URL: https://github.com/apache/solr/pull/965#discussion_r959491853
##########
solr/core/src/java/org/apache/solr/core/CoreContainer.java:
##########
@@ -2424,6 +2430,34 @@ public PlacementPluginFactory<? extends PlacementPluginConfig> getPlacementPlugi
public void runAsync(Runnable r) {
coreContainerAsyncTaskExecutor.submit(r);
}
+
+ public static void setWeakStringInterner() {
+ boolean enable = "true".equals(System.getProperty("use.str.intern", "true"));
Review Comment:
This should be prefixed with solr.
Maybe instead of true/false it would make sense to do strong/weak/off? No real opinion here, talking through ideas out loud to see how they sound... probably too much complexity and not worth it, but I'll leave that up to you.
ASF GitHub Bot
logged work - 05/Sep/22 12:59
-
- Time Spent:
- 10m
-
noblepaul commented on PR #965:
URL: https://github.com/apache/solr/pull/965#issuecomment-1236987060
@madrob
any idea why the gradle precommit is failing ? it doesn't fail for me locally.
ASF GitHub Bot
logged work - 06/Sep/22 05:58
-
- Time Spent:
- 10m
-
chatman commented on PR #965:
URL: https://github.com/apache/solr/pull/965#issuecomment-1237691267
Precommit passes for me locally.
```
ishan@x1extreme ~/code/fullstorydev-solr/solr (jira/SOLR-16328) $ ../gradlew check -x test
Downloading https://services.gradle.org/distributions/gradle-7.2-all.zip
..............10%...............20%...............30%...............40%...............50%...............60%...............70%...............80%...............90%...............100%
To honour the JVM settings for this build a single-use Daemon process will be forked. See https://docs.gradle.org/7.2/userguide/gradle_daemon.html#sec:disabling_the_daemon.
Daemon will be stopped at the end of the build
> Task :localSettings
IMPORTANT. This is the first time you ran the build. I wrote some sane defaults (for this machine) to 'gradle.properties', they will be picked up on consecutive gradle invocations (not this one).
Run gradlew :helpLocalSettings for more information.
> Task :errorProneSkipped
WARNING: errorprone disabled (skipped on builds not running inside CI environments, pass -Pvalidation.errorprone=true to enable)
> Task :solr:solrj:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:solrj-zookeeper:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:core:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:test-framework:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:analysis-extras:compileJava
Note: /home/ishan/code/fullstorydev-solr/solr/modules/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:sql:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:core:analyzeClassesDependencies
Dependency analysis found issues.
unusedDeclaredArtifacts
- org.apache.lucene:lucene-backward-codecs:9.3.0@jar
> Task :solr:core:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:prometheus-exporter:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:prometheus-exporter:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:prometheus-exporter:analyzeTestClassesDependencies
Dependency analysis found issues.
unusedDeclaredArtifacts
- org.apache.lucene:lucene-test-framework:9.3.0@jar
> Task :solr:solr-ref-guide:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:solr-ref-guide:downloadAntoraCli
added 60 packages, changed 1 package, and audited 62 packages in 17s
7 packages are looking for funding
run `npm fund` for details
found 0 vulnerabilities
npm notice
npm notice New minor version of npm available! 8.11.0 -> 8.19.1
npm notice Changelog: <https://github.com/npm/cli/releases/tag/v8.19.1>
npm notice Run `npm install -g npm@8.19.1` to update!
npm notice
> Task :solr:solr-ref-guide:downloadAntoraLunrExtension
added 18 packages, changed 1 package, and audited 81 packages in 7s
18 packages are looking for funding
run `npm fund` for details
found 0 vulnerabilities
> Task :solr:solr-ref-guide:downloadAntoraSiteGenerator
added 153 packages, changed 1 package, and audited 235 packages in 23s
31 packages are looking for funding
run `npm fund` for details
9 high severity vulnerabilities
To address issues that do not require attention, run:
npm audit fix
To address all issues (including breaking changes), run:
npm audit fix --force
Run `npm audit` for details.
> Task :solr:solr-ref-guide:downloadAsciidoctorMathjaxExtension
added 8 packages, changed 1 package, and audited 244 packages in 8s
31 packages are looking for funding
run `npm fund` for details
9 high severity vulnerabilities
To address issues that do not require attention, run:
npm audit fix
To address all issues (including breaking changes), run:
npm audit fix --force
Run `npm audit` for details.
> Task :solr:solr-ref-guide:buildLocalSite
The generated local ref-guide can be found at:
/home/ishan/code/fullstorydev-solr/solr/solr-ref-guide/build/site/index.html
> Task :solr:solr-ref-guide:downloadLinkValidator
npm WARN deprecated formidable@1.2.6: Please upgrade to latest, formidable@v2 or formidable@v3! Check these notes: https://bit.ly/2ZEqIau
npm WARN deprecated superagent@3.8.3: Please upgrade to v7.0.2+ of superagent. We have fixed numerous issues with streams, form-data, attach(), filesystem errors not bubbling up (ENOENT on attach()), and all tests are now passing. See the releases tab for more information at <https://github.com/visionmedia/superagent/releases>.
added 49 packages, changed 1 package, and audited 294 packages in 5s
37 packages are looking for funding
run `npm fund` for details
9 high severity vulnerabilities
To address issues that do not require attention, run:
npm audit fix
To address all issues (including breaking changes), run:
npm audit fix --force
Run `npm audit` for details.
> Task :solr:solr-ref-guide:checkSiteLinks
{
"stats": {
"errors": [],
"warnings": [],
"parsedFiles": 814,
"localLinks": 231,
"localAnchorLinks": 3053,
"parentLinks": 195,
"parentAnchorLinks": 1,
"remoteLinks": 0,
"remoteAnchorLinks": 0
}
}
> Task :solr:solrj:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:test-framework:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:analysis-extras:compileTestJava
Note: /home/ishan/code/fullstorydev-solr/solr/modules/analysis-extras/src/test/org/apache/solr/schema/TestICUCollationFieldUDVAS.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:analytics:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:analytics:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:clustering:analyzeTestClassesDependencies
Dependency analysis found issues.
unusedDeclaredArtifacts
- org.apache.lucene:lucene-test-framework:9.3.0@jar
> Task :solr:modules:extraction:compileJava
Note: /home/ishan/code/fullstorydev-solr/solr/modules/extraction/src/java/org/apache/solr/handler/extraction/ExtractingDocumentLoader.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:extraction:compileTestJava
Note: /home/ishan/code/fullstorydev-solr/solr/modules/extraction/src/test/org/apache/solr/handler/extraction/ExtractingRequestHandlerTest.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:gcs-repository:compileTestJava
Note: /home/ishan/code/fullstorydev-solr/solr/modules/gcs-repository/src/test/org/apache/solr/gcs/ConcurrentDelegatingStorage.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:hadoop-auth:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:hadoop-auth:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:hadoop-auth:analyzeTestClassesDependencies
Dependency analysis found issues.
unusedDeclaredArtifacts
- com.github.spotbugs:spotbugs-annotations:4.5.3@jar
> Task :solr:modules:hdfs:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:hdfs:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:jwt-auth:compileTestJava
Note: /home/ishan/code/fullstorydev-solr/solr/modules/jwt-auth/src/test/org/apache/solr/security/jwt/JWTAuthPluginIntegrationTest.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:jwt-auth:analyzeTestClassesDependencies
Dependency analysis found issues.
unusedDeclaredArtifacts
- com.fasterxml.jackson.core:jackson-databind:2.13.3@jar
> Task :solr:modules:langid:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:ltr:compileTestJava
Note: /home/ishan/code/fullstorydev-solr/solr/modules/ltr/src/test/org/apache/solr/ltr/model/TestMultipleAdditiveTreesModel.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:scripting:compileTestJava
Note: /home/ishan/code/fullstorydev-solr/solr/modules/scripting/src/test/org/apache/solr/scripting/update/TestBadScriptingUpdateProcessorConfig.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.
You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.
See https://docs.gradle.org/7.2/userguide/command_line_interface.html#sec:command_line_warnings
BUILD SUCCESSFUL in 13m 33s
463 actionable tasks: 463 executed
ishan@x1extreme ~/code/fullstorydev-solr/solr (jira/SOLR-16328) $ java -version
openjdk version "11.0.15" 2022-04-19
OpenJDK Runtime Environment 18.9 (build 11.0.15+10)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.15+10, mixed mode, sharing)
ishan@x1extreme ~/code/fullstorydev-solr/solr (jira/SOLR-16328) $
```
ASF GitHub Bot
logged work - 06/Sep/22 16:36
-
- Time Spent:
- 10m
-
madrob commented on PR #965:
URL: https://github.com/apache/solr/pull/965#issuecomment-1238393512
The failure is ```/home/runner/work/solr/solr/solr/solrj/src/java/org/apache/solr/common/cloud/CompositeIdRouter.java:238: warning: [NarrowCalculation] This product of integers could overflow before being implicitly cast to a long.
[41](https://github.com/apache/solr/runs/8189262631?check_suite_focus=true#step:7:42)
[42](https://github.com/apache/solr/runs/8189262631?check_suite_focus=true#step:7:43)
> Task :solr:solrj:compileJava
[43](https://github.com/apache/solr/runs/8189262631?check_suite_focus=true#step:7:44)
boolean round = rangeStep >= (1 << bits) * 16;
[44](https://github.com/apache/solr/runs/8189262631?check_suite_focus=true#step:7:45)
^
[45](https://github.com/apache/solr/runs/8189262631?check_suite_focus=true#step:7:46)
(see https://errorprone.info/bugpattern/NarrowCalculation)
[46](https://github.com/apache/solr/runs/8189262631?check_suite_focus=true#step:7:47)
Did you mean 'boolean round = rangeStep >= (1 << bits) * 16L;'?
[47](https://github.com/apache/solr/runs/8189262631?check_suite_focus=true#step:7:48)```
ASF GitHub Bot
logged work - 07/Sep/22 00:49
-
- Time Spent:
- 10m
-
noblepaul commented on PR #965:
URL: https://github.com/apache/solr/pull/965#issuecomment-1238787574
unfortunately we don't have a choice. We have to upgrade error-prone as well
```
> Could not resolve com.google.errorprone:error_prone_annotations:2.11.0.
Required by:
project :solr:modules:hdfs > project :solr:core
project :solr:modules:hdfs > project :solr:solrj
project :solr:modules:hdfs > project :solr:core > project :solr:solrj-zookeeper
> Cannot find a version of 'com.google.errorprone:error_prone_annotations' that satisfies the version constraints:
Dependency path 'org.apache.solr:hdfs:10.0.0-SNAPSHOT' --> 'com.github.ben-manes.caffeine:caffeine:3.1.1' (apiElements) --> 'com.google.errorprone:error_prone_annotations:2.14.0'
Constraint path 'org.apache.solr:hdfs:10.0.0-SNAPSHOT' --> 'org.apache:solr-root:10.0.0-SNAPSHOT' (gcvLocks) --> 'com.google.errorprone:error_prone_annotations:{strictly 2.11.0}' because of the following reason: Locked by versions.lock
Constraint path 'org.apache.solr:hdfs:10.0.0-SNAPSHOT' --> 'org.apache.solr:solrj:10.0.0-SNAPSHOT' (apiElements) --> 'com.google.errorprone:error_prone_annotations:2.11.0' because of the following reason: Computed
```
ASF GitHub Bot
logged work - 07/Sep/22 11:39
-
- Time Spent:
- 10m
-
noblepaul merged PR #965:
URL: https://github.com/apache/solr/pull/965