Details
-
Task
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
-
New
Description
Recently I was rereading the HNSW paper (https://arxiv.org/pdf/1603.09320.pdf) and noticed that they suggest using a different maxConn for the upper layers vs. the bottom one (which contains the full neighborhood graph). Specifically, they suggest using maxConn=M for upper layers and maxConn=2*M for the bottom. This differs from what we do, which is to use maxConn=M for all layers.
I tried updating our logic using a hacky patch, and noticed an improvement in latency for higher recall values (which is consistent with the paper's observation):
Results on glove-100-angular
Parameters: M=32, efConstruction=100
As we'd expect, indexing becomes a bit slower:
Baseline: Indexed 1183514 documents in 733s Candidate: Indexed 1183514 documents in 948s
When we benchmarked Lucene HNSW against hnswlib in LUCENE-9937, we noticed a big difference in recall for the same settings of M and efConstruction. (Even adding graph layers in LUCENE-10054 didn't really affect recall.) With this change, the recall is now very similar:
Results on glove-100-angular
Parameters: M=32, efConstruction=100
k Approach Recall QPS 10 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.563 4410.499 50 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.798 1956.280 100 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.862 1209.734 500 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.958 341.428 800 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.974 230.396 1000 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.980 188.757 10 hnswlib ({'M': 32, 'efConstruction': 100}) 0.552 16745.433 50 hnswlib ({'M': 32, 'efConstruction': 100}) 0.794 5738.468 100 hnswlib ({'M': 32, 'efConstruction': 100}) 0.860 3336.386 500 hnswlib ({'M': 32, 'efConstruction': 100}) 0.956 832.982 800 hnswlib ({'M': 32, 'efConstruction': 100}) 0.973 541.097 1000 hnswlib ({'M': 32, 'efConstruction': 100}) 0.979 442.163
I think it'd be nice update to maxConn so that we faithfully implement the paper's algorithm. This is probably least surprising for users, and I don't see a strong reason to takeĀ a different approach from the paper? Let me know what you think!