Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-8184

With statistical mode, facet count seems having higher error rate than expected

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.6.16
    • Fix Version/s: None
    • Component/s: query, search
    • Labels:
      None

      Description

      We identified facet count drifts here and there especially for small counts, which makes them obvious. Usually it is off by 1 but seeing bigger like 20 or 30 as well. Here’s one example, consider this query run by a non-admin user,

      1_group.propertyvalues.extractFacet=true
      
      1_group.propertyvalues.property=jcr:content/metadata/msft:associatedCampaign
      
      2_group.0_path=/content/dam/microsoft/rad/
      
      2_group.p.or=true
      
      orderby=jcr:content/jcr:lastModified
      
      orderby.sort=desc
      
      p.facetStrategy=oak
      
      p.facets=true
      
      p.guessTotal=250
      
      p.limit=-1
      
      p.offset=0
      
      property=jcr:content/metadata/msft:lifecycleStatus
      
      property.10_value=microsoft:studios/lifecycleStatus/Created
      
      property.1_value=Created
      
      property.2_value=Under Review
      
      property.3_value=Rejected
      
      property.4_value=Approved
      
      property.5_value=Published
      
      property.6_value=microsoft:search-marketing/lifecycleStatus/Approved
      
      property.7_value=microsoft:search-marketing/lifecycleStatus/Created
      
      property.8_value=microsoft:studios/lifecycleStatus/Approved
      
      property.9_value=microsoft:studios/lifecycleStatus/UnderReview
      
      type=dam:Asset
      

      This is what returns, and notice one of the facet `/content/dam/microsoft/rad/public-campaign` has 1 count.

      If we add this facet value as one of the query condition, like this

      5_group.1_propertyvalues.0_values=/content/dam/microsoft/rad/public-campaign
      
      5_group.1_propertyvalues.extractFacet=true
      
      5_group.1_propertyvalues.property=jcr:content/metadata/msft:associatedCampaign
      
      2_group.0_path=/content/dam/microsoft/rad/
      
      2_group.p.or=true
      
      orderby=jcr:content/jcr:lastModified
      
      orderby.sort=desc
      
      p.facetStrategy=oak
      
      p.facets=true
      
      p.guessTotal=250
      
      p.limit=-1
      
      p.offset=0
      
      property=jcr:content/metadata/msft:lifecycleStatus
      
      property.10_value=microsoft:studios/lifecycleStatus/Created
      
      property.1_value=Created
      
      property.2_value=Under Review
      
      property.3_value=Rejected
      
      property.4_value=Approved
      
      property.5_value=Published
      
      property.6_value=microsoft:search-marketing/lifecycleStatus/Approved
      
      property.7_value=microsoft:search-marketing/lifecycleStatus/Created
      
      property.8_value=microsoft:studios/lifecycleStatus/Approved
      
      property.9_value=microsoft:studios/lifecycleStatus/UnderReview
      
      type=dam:Asset
      

      We got this, as you can see the actual count is 2.

      Is it an expected behavior? We are even seeing count being off on large result sets…this makes user experience pretty bad and we thought the error rate would be much lower than that

       

       

        Attachments

        1. image-2019-03-29-11-00-16-305.png
          62 kB
          Kelvin Xu
        2. image-2019-03-29-11-00-11-094.png
          62 kB
          Kelvin Xu
        3. image-2019-03-29-10-59-17-163.png
          254 kB
          Kelvin Xu
        4. image-2019-03-29-10-59-03-699.png
          254 kB
          Kelvin Xu

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              kexu Kelvin Xu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: