XML

Word

Printable

JSON

Details

Type: Wish
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 9.0
Fix Version/s: 9.0, 8.10
Component/s: core/codecs
Labels:

Lucene Fields:

New

Description

Simple text codec add skip list data( include impact) to help understand index format，For debugging, curiosity, transparency only!! When term's docFreq greater than or equal to SimpleTextSkipWriter.BLOCK_SIZE (default value is 8), Simple text codec will write skip list, the .pst (simple text term dictionary file) file will looks like this

field title
  term args
    doc 2
      freq 2
      pos 7
      pos 10
    ## we omit docs for better view ......
    doc 98
      freq 2
      pos 2
      pos 6
    skipList 
?
      level 1
        skipDoc 65
        skipDocFP 949
        impacts 
          impact 
            freq 1
            norm 2
          impact 
            freq 2
            norm 12
          impact 
            freq 3
            norm 13
        impacts_end 
?
      level 0
        skipDoc 17
        skipDocFP 284
        impacts 
          impact 
            freq 1
            norm 2
          impact 
            freq 2
            norm 12
        impacts_end         
        skipDoc 34
        skipDocFP 624
        impacts 
          impact 
            freq 1
            norm 2
          impact 
            freq 2
            norm 12
          impact 
            freq 3
            norm 14
        impacts_end         
        skipDoc 65
        skipDocFP 949
        impacts 
          impact 
            freq 1
            norm 2
          impact 
            freq 2
            norm 12
          impact 
            freq 3
            norm 13
        impacts_end         
        skipDoc 90
        skipDocFP 1311
        impacts 
          impact 
            freq 1
            norm 2
          impact 
            freq 2
            norm 10
          impact 
            freq 3
            norm 13
          impact 
            freq 4
            norm 14
        impacts_end 
END
checksum 00000000000829315543

compare with previous，we add skipList，level, skipDoc, skipDocFP, impacts, impact, freq, norm nodes, at the same, simple text codec can support advanceShallow when search time.

Why there has question mark symbol in the file ?

Because the MultiLevelSkipListWriter will write "length" and "childPointer" with VLong

This speed up search process ?

No!!! It can be advanceShallow when search, but why not speed up yet? Because the skip list data after docs(see the file described before), it must iterate all docs before read skip list data, so it never speed up search time. it has no "skipOffset" to direct read skip list data, but as mentioned before, it is For debugging, curiosity, transparency only!! If this is a problem, may be the next time, i can add "skipOffset", so we can read skip list data directly.

Attachments

Issue Links

links to

GitHub Pull Request #224

GitHub Pull Request #2565

Activity

People

Assignee:: Unassigned

Reporter:: wuda

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 24/Jul/21 15:39

Updated:: 28/Aug/22 16:24

Resolved:: 30/Aug/21 13:30

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

6h 40m