Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
As the JSON Facet API becomes more complex and has more optimizations, it would be nice to get a better view of what is going on in faceting... what methods/algorithms are being used and what is taking up the most time or memory.
- the strategy/method used to facet the field
- number of unique values in facet field
- memory usage of facet field itself
- memory usage for request (count arrays, etc)
- timing of various parts of facet request (finding top N, executing sub-facets, etc)
This will also help with unit tests, making sure we have proper coverage of various optimizations.
Some of this information collection may make sense to happen all the time, while other information may be calculated only if requested.
When adding facet info to a response, it could be done one of two ways:
1. in the existing debug block in the response, along with other debug info, structured like
2. directly in the facet response (i.e. in something like "_debug_" that is a sibling of "buckets")
We need to also consider how to merge distributed debug info (and add more info about the distributed phase as well). Given this, (2) may be simpler (adding directly to facet response) as we already have a framework for merging.
Although not necessarily part of this initial issue, we should think about how to get information about certain requests that does not involve modifying the actual request or response. For example, "log telemetry data for the next N requests that match this pattern". Something like that would more naturally point to method 1 for returning the data (i.e. separate from the response).
Attachments
1.
|
Create Facet Telemetry for Nested Facet Query |
|
Closed | Unassigned |
2.
|
Add doc set size and number of buckets metrics |
|
Closed | Unassigned |
3.
|
Merge facet telemetry information from shards |
|
Open | Unassigned |
4.
|
Design Facet Telemetry for non-JSON field facet |
|
Closed | Unassigned |