Details
-
Wish
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
9.0
-
New
Description
Simple text codec add skip list data( include impact) to help understand index format,For debugging, curiosity, transparency only!! When term's docFreq greater than or equal to SimpleTextSkipWriter.BLOCK_SIZE (default value is 8), Simple text codec will write skip list, the .pst (simple text term dictionary file) file will looks like this
field title
term args
doc 2
freq 2
pos 7
pos 10
## we omit docs for better view ......
doc 98
freq 2
pos 2
pos 6
skipList
?
level 1
skipDoc 65
skipDocFP 949
impacts
impact
freq 1
norm 2
impact
freq 2
norm 12
impact
freq 3
norm 13
impacts_end
?
level 0
skipDoc 17
skipDocFP 284
impacts
impact
freq 1
norm 2
impact
freq 2
norm 12
impacts_end
skipDoc 34
skipDocFP 624
impacts
impact
freq 1
norm 2
impact
freq 2
norm 12
impact
freq 3
norm 14
impacts_end
skipDoc 65
skipDocFP 949
impacts
impact
freq 1
norm 2
impact
freq 2
norm 12
impact
freq 3
norm 13
impacts_end
skipDoc 90
skipDocFP 1311
impacts
impact
freq 1
norm 2
impact
freq 2
norm 10
impact
freq 3
norm 13
impact
freq 4
norm 14
impacts_end
END
checksum 00000000000829315543
compare with previous,we add skipList,level, skipDoc, skipDocFP, impacts, impact, freq, norm nodes, at the same, simple text codec can support advanceShallow when search time.
Why there has question mark symbol in the file ?
Because the MultiLevelSkipListWriter will write "length" and "childPointer" with VLong
This speed up search process ?
No!!! It can be advanceShallow when search, but why not speed up yet? Because the skip list data after docs(see the file described before), it must iterate all docs before read skip list data, so it never speed up search time. it has no "skipOffset" to direct read skip list data, but as mentioned before, it is For debugging, curiosity, transparency only!! If this is a problem, may be the next time, i can add "skipOffset", so we can read skip list data directly.