Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
-
New
Description
Stored fields configure a maximum number of fields per block, whose goal is to make sure that you don't decompress more than X documents to get access to a single one. However this has interesting effects with the new format.
For instance we use 4kB of dictionary and blocks of 60kB for at most 512 documents per block. So if your documents are very small, say 10 bytes, the block will be 5120 bytes overall, and we'll first compress 4096 bytes independently, and then 5120-4096=1024 bytes with 4096 bytes of dictionary. In this case training the dictionary takes more time than actually compressing the data, and it's not even sure it's worth it since only 1024 bytes out of the 5120 bytes of the block get compressed with a preset dictionary.
I'm considering adapting the dictionary size and the block size to the total block size in order to better handle such cases.
Attachments
Issue Links
- links to