Attached patch kinda overhauls how facets are indexed. The concept stays the same, but I sort of rewrote it all. Here's how the code in the patch works:
- FacetFields (previously CategoryDocumentBuilder) takes an Iterable<CategoryPath> and indexes them in two steps:
- DrillDownStream indexes the drill-down tokens, e.g. $facets:Author, $facets:Author/Bob
- CategoryListBuilder encodes the category ordinals into a Map<String,BytesRef> (more on this later), which is later set as the payload of $facets:$fulltree$
- Both these steps work per CategoryListParams (in case the application wishes to index groups of facets in different category lists)
- AssociationsFacetFields (previously EnhancementsDocumentBuilder) extends FacetFields and takes a CategoryAssociationsContainer (which implements Iterable<CategoryPath>) and holds a mapping from a CategoryPath to CategoryAssociation
- AssociationsDrillDownStream extends DrillDownStream and adds association values to the drill-down tokens' payloads
- AssociationsCategoryListBuilder extends CategoryListBuilder and encodes <category,association-value> pairs into their own BytesRef
- CategoryAssociation replaces CategoryEnhancement and simplifies the usage (end-user wise) a lot !
- Two implementations CategoryIntAssociation and CategoryFloatAssociation support the previously AssociationEnhancement + AssociationInt/FloatProperty and allow associating an int/float value with a category
- Every CategoryAssociation impl is responsible for serialization of its value to a ByteArrayDataOutput (and de-serialize from ByteArrayDataInput)
- Every implementation needs to specify its categoryListID, since it determines the term payload under which the association values are encoded
- The two FacetRequests, AssociationIntSumFacetRequest and AssociationFloatSumFacetRequest, work with CategoryAssociation rather than the enhancement
- EnhancementsIndexingParams were removed, and togeher with them all the 'attributes', 'enhancements' and 'streaming' packages
- The Map<String,BytesRef> easily supports partitions and associations:
- When simple categories are indexed, no partitions, a single entry exists in the map
- When simple categories are indexed with partitions, an entry per partition exists in the map, e.g. $facets:$fulltree$1, $facets:$fulltree$2 etc.
- When associations are indexed, the map contains the ordinals list (as described above) and the association values per CategoryAssociation.getCategoryListID(), so e.g. an int association is encoded into a different BytesRef than a float one
I chose to implement it all from scratch because the code was very intertwined and complicated, much because of a very complicated desing for enhancements. At least to me, the code is now much simpler. Migrating facets from this code to DocValues should be an easy task - all that needs to be done is to write the output of CategoryListBuilder to a DocValues field, rather than a TokenStream payload.
The patch is huge, but mostly because of all the code that's been removed. When you review it, focus on the classes mentioned above.
NOTE: the new associations code breaks backwards compatibility with old indexes:
- Previously both the int and float associations were written to the same associations list as integers, and the float one used Float.intBitsToFloat and vice versa. Now they are written to two separate lists
- Previously the code supported multiple enhancements which affected how they were encoded (e.g. #ENHANCEMENTS, #ENH_LENGTHS, #ENH_BYTES). But we always had only one enhancement (AssociationEnhancement), so that encoding was really redundant.
- In order to support multiple CategoryAssociations per CategoryPath, one can easily write a CompoundAssociation and take care of its serialization.
Since enhancements/associations are quite an advanced feature, I think this break makes sense. We can always add a backwards layer later if someone complains (and cannot reindex). Keeping the previous code, which was prepared to handle multiple CategoryEnhancement types, while only one enhancement was available to our users did not make sense to me.