[LUCENE-7839] Optimize the default NormsFormat for the case that all norms are in 0..16 - ASF JIRA

Details

Type: Task
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Lucene Fields:

New

Description

Given how we now store the length of the field in norms, we could optimize the default norms format for the case that all norms are in 0..16 and store it on 4 bits. This would be picked up for short fields that have less than 16 terms (eg. title fields) and reduce disk utilization by 2.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-7839.patch
24/May/17 16:14
13 kB
Adrien Grand

Activity

Ascending order - Click to sort in descending order

Robert Muir added a comment - 18/May/17 23:07

Should this really be necessary given an iterator API? It seems too highly specialized and fragile (just like what random access APIs were doing), versus using a e.g. block-level compression like the posting lists.

Robert Muir added a comment - 18/May/17 23:07 Should this really be necessary given an iterator API? It seems too highly specialized and fragile (just like what random access APIs were doing), versus using a e.g. block-level compression like the posting lists.

Adrien Grand added a comment - 24/May/17 16:14

I tried to leverage the iterator API similarly to what numeric doc values do, but luceneutil seems to notice a performance hit:

                    TaskQPS baseline      StdDev   QPS patch      StdDev                Pct diff
                HighTerm      569.71     (11.5%)      490.35      (9.0%)  -13.9% ( -30% -    7%)
              OrHighHigh      138.08     (11.6%)      123.27      (7.1%)  -10.7% ( -26% -    9%)
               OrHighMed      295.37     (11.2%)      269.99      (8.1%)   -8.6% ( -25% -   12%)
               OrHighLow      379.17      (9.1%)      351.63      (6.4%)   -7.3% ( -20% -    9%)
                 MedTerm     1518.29     (11.9%)     1421.77      (6.8%)   -6.4% ( -22% -   14%)
             AndHighHigh      386.22      (9.3%)      367.76      (9.0%)   -4.8% ( -21% -   14%)
                 LowTerm     3236.73      (8.3%)     3118.34      (8.3%)   -3.7% ( -18% -   14%)
         MedSloppyPhrase      555.94      (9.6%)      537.02      (6.3%)   -3.4% ( -17% -   13%)
   HighTermDayOfYearSort      330.62     (12.2%)      320.20      (9.8%)   -3.2% ( -22% -   21%)
               MedPhrase      635.77      (9.6%)      616.12      (8.1%)   -3.1% ( -18% -   16%)
        HighSloppyPhrase      147.02      (8.6%)      142.77      (7.9%)   -2.9% ( -17% -   14%)
                  IntNRQ      117.56      (9.8%)      114.43     (10.2%)   -2.7% ( -20% -   19%)
            HighSpanNear       57.73      (7.9%)       56.21      (7.4%)   -2.6% ( -16% -   13%)
         LowSloppyPhrase      385.52      (8.9%)      375.39      (6.5%)   -2.6% ( -16% -   13%)
               LowPhrase      653.67      (9.7%)      637.17      (7.4%)   -2.5% ( -17% -   16%)
                 Prefix3      287.63     (12.3%)      281.78     (10.3%)   -2.0% ( -21% -   23%)
                 Respell      144.41      (7.8%)      141.67      (6.7%)   -1.9% ( -15% -   13%)
              AndHighMed      676.46      (8.3%)      665.05      (9.8%)   -1.7% ( -18% -   17%)
                Wildcard      214.90      (8.5%)      211.57      (7.0%)   -1.5% ( -15% -   15%)
              HighPhrase       20.11      (9.7%)       20.03      (8.5%)   -0.4% ( -17% -   19%)
             MedSpanNear      476.40      (8.7%)      476.48      (7.7%)    0.0% ( -15% -   18%)
              AndHighLow      964.81      (9.8%)      965.18      (8.0%)    0.0% ( -16% -   19%)
       HighTermMonthSort     1190.72      (9.6%)     1194.44     (11.4%)    0.3% ( -18% -   23%)
             LowSpanNear      421.27      (7.8%)      423.97      (9.9%)    0.6% ( -15% -   19%)
                  Fuzzy2       49.17     (16.2%)       50.09     (19.1%)    1.9% ( -28% -   44%)
                  Fuzzy1      129.89     (12.6%)      132.32     (11.9%)    1.9% ( -20% -   30%)

You can find the patch that I played with attached. It keeps the current levels of compression, but just splits values into blocks of 2^14 values and decides on the number of bits on a per-block basis. Maybe there is a better way to do this...

Adrien Grand added a comment - 24/May/17 16:14 I tried to leverage the iterator API similarly to what numeric doc values do, but luceneutil seems to notice a performance hit: TaskQPS baseline StdDev QPS patch StdDev Pct diff HighTerm 569.71 (11.5%) 490.35 (9.0%) -13.9% ( -30% - 7%) OrHighHigh 138.08 (11.6%) 123.27 (7.1%) -10.7% ( -26% - 9%) OrHighMed 295.37 (11.2%) 269.99 (8.1%) -8.6% ( -25% - 12%) OrHighLow 379.17 (9.1%) 351.63 (6.4%) -7.3% ( -20% - 9%) MedTerm 1518.29 (11.9%) 1421.77 (6.8%) -6.4% ( -22% - 14%) AndHighHigh 386.22 (9.3%) 367.76 (9.0%) -4.8% ( -21% - 14%) LowTerm 3236.73 (8.3%) 3118.34 (8.3%) -3.7% ( -18% - 14%) MedSloppyPhrase 555.94 (9.6%) 537.02 (6.3%) -3.4% ( -17% - 13%) HighTermDayOfYearSort 330.62 (12.2%) 320.20 (9.8%) -3.2% ( -22% - 21%) MedPhrase 635.77 (9.6%) 616.12 (8.1%) -3.1% ( -18% - 16%) HighSloppyPhrase 147.02 (8.6%) 142.77 (7.9%) -2.9% ( -17% - 14%) IntNRQ 117.56 (9.8%) 114.43 (10.2%) -2.7% ( -20% - 19%) HighSpanNear 57.73 (7.9%) 56.21 (7.4%) -2.6% ( -16% - 13%) LowSloppyPhrase 385.52 (8.9%) 375.39 (6.5%) -2.6% ( -16% - 13%) LowPhrase 653.67 (9.7%) 637.17 (7.4%) -2.5% ( -17% - 16%) Prefix3 287.63 (12.3%) 281.78 (10.3%) -2.0% ( -21% - 23%) Respell 144.41 (7.8%) 141.67 (6.7%) -1.9% ( -15% - 13%) AndHighMed 676.46 (8.3%) 665.05 (9.8%) -1.7% ( -18% - 17%) Wildcard 214.90 (8.5%) 211.57 (7.0%) -1.5% ( -15% - 15%) HighPhrase 20.11 (9.7%) 20.03 (8.5%) -0.4% ( -17% - 19%) MedSpanNear 476.40 (8.7%) 476.48 (7.7%) 0.0% ( -15% - 18%) AndHighLow 964.81 (9.8%) 965.18 (8.0%) 0.0% ( -16% - 19%) HighTermMonthSort 1190.72 (9.6%) 1194.44 (11.4%) 0.3% ( -18% - 23%) LowSpanNear 421.27 (7.8%) 423.97 (9.9%) 0.6% ( -15% - 19%) Fuzzy2 49.17 (16.2%) 50.09 (19.1%) 1.9% ( -28% - 44%) Fuzzy1 129.89 (12.6%) 132.32 (11.9%) 1.9% ( -20% - 30%) You can find the patch that I played with attached. It keeps the current levels of compression, but just splits values into blocks of 2^14 values and decides on the number of bits on a per-block basis. Maybe there is a better way to do this...

Robert Muir added a comment - 24/May/17 17:09

OK, i will try to look at the patch. I'm not sure its necessary to do delta compression (max-min) in this case, in case it makes things simpler.

Robert Muir added a comment - 24/May/17 17:09 OK, i will try to look at the patch. I'm not sure its necessary to do delta compression (max-min) in this case, in case it makes things simpler.

Adrien Grand added a comment - 24/May/17 20:37

Agreed. At the moment the patch does not do delta compression, it just reads plain byte/short/int/long values like master. The only difference with master is that it splits values into blocks of 16384 and encodes each block independently. It does not even specialize the case that norms are in 0..15 since I first wanted to get an idea of the performance impact of leveraging the iterator API so that a single outlier does not raise the number of bits per value for document.

Adrien Grand added a comment - 24/May/17 20:37 Agreed. At the moment the patch does not do delta compression, it just reads plain byte/short/int/long values like master. The only difference with master is that it splits values into blocks of 16384 and encodes each block independently. It does not even specialize the case that norms are in 0..15 since I first wanted to get an idea of the performance impact of leveraging the iterator API so that a single outlier does not raise the number of bits per value for document.

Robert Muir added a comment - 24/May/17 21:22

Right, but this is still old-style in the sense that it could still be done with the random access API with shift+mask ... and this block compression has been done before and always gave overhead.

I was suggesting much more like postings to eliminate the readByte()/readByte()/readByte() stuff for the worst case.

Robert Muir added a comment - 24/May/17 21:22 Right, but this is still old-style in the sense that it could still be done with the random access API with shift+mask ... and this block compression has been done before and always gave overhead. I was suggesting much more like postings to eliminate the readByte()/readByte()/readByte() stuff for the worst case.

Tomoko Uchida added a comment - 28/Aug/22 15:15

This issue was moved to GitHub issue: #8890.

Tomoko Uchida added a comment - 28/Aug/22 15:15 This issue was moved to GitHub issue: #8890 .

People

Assignee:: Unassigned

Reporter:: Adrien Grand

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 18/May/17 14:50

Updated:: 28/Aug/22 15:15

Lucene - Core

Details

Description

Attachments

Attachments

Activity

People

Dates