Index: src/site/src/documentation/content/xdocs/fileformats.xml
===================================================================
--- src/site/src/documentation/content/xdocs/fileformats.xml (revision 483366)
+++ src/site/src/documentation/content/xdocs/fileformats.xml (working copy)
@@ -926,7 +926,8 @@
Starting with Lucene 1.4 the compound file format became default. This
- is simply a container for all files described in the next section.
Compound (.cfs) --> FileCount, <DataOffset, FileName>
FileCount
@@ -1511,14 +1512,25 @@
The .del file is
- optional, and only exists when a segment contains deletions:
+ optional, and only exists when a segment contains deletions.
Deletions
+ Although per-segment, this file is maintained exterior to compound segment files.
+
+ Pre-2.1:
+ Deletions
(.del) --> ByteCount,BitCount,Bits
ByteSize,BitCount -->
+
+ 2.1 and above:
+ Deletions
+ (.del) --> [Format],ByteCount,BitCount, Bits | DGaps (depending on Format)
+ Format,ByteSize,BitCount -->
Uint32
DGaps --> + <DGap,NonzeroByte> + NonzeroBytesCount +
+ +DGap --> + VInt +
+ +NonzeroByte --> + Byte +
+ +Format + is Optional. -1 indicates DGaps. Non-negative value indicates Bits, and that Format is excluded. +
+ByteCount indicates the number of bytes in Bits. It is typically (SegSize/8)+1. @@ -1544,6 +1573,20 @@ Bits contains two bytes, 0x00 and 0x02, then document 9 is marked as deleted.
+ +DGaps + represents sparse bit-vectors more efficiently than Bits. + It is made of DGaps on indexes of nonzero bytes in Bits, + and the nonzero bytes themselves. The number of nonzero bytes + in Bits (NonzeroBytesCount) is not stored. +
+For example, + if there are 8000 bits and only bits 10,12,32 are set, + DGaps would be used: +
++ (VInt) 1 , (byte) 20 , (VInt) 3 , (Byte) 1 +