Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-8184

With statistical mode, facet count seems having higher error rate than expected

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.6.16
    • None
    • query, search
    • None

    Description

      We identified facet count drifts here and there especially for small counts, which makes them obvious. Usually it is off by 1 but seeing bigger like 20 or 30 as well. Here’s one example, consider this query run by a non-admin user,

      1_group.propertyvalues.extractFacet=true
      
      1_group.propertyvalues.property=jcr:content/metadata/msft:associatedCampaign
      
      2_group.0_path=/content/dam/microsoft/rad/
      
      2_group.p.or=true
      
      orderby=jcr:content/jcr:lastModified
      
      orderby.sort=desc
      
      p.facetStrategy=oak
      
      p.facets=true
      
      p.guessTotal=250
      
      p.limit=-1
      
      p.offset=0
      
      property=jcr:content/metadata/msft:lifecycleStatus
      
      property.10_value=microsoft:studios/lifecycleStatus/Created
      
      property.1_value=Created
      
      property.2_value=Under Review
      
      property.3_value=Rejected
      
      property.4_value=Approved
      
      property.5_value=Published
      
      property.6_value=microsoft:search-marketing/lifecycleStatus/Approved
      
      property.7_value=microsoft:search-marketing/lifecycleStatus/Created
      
      property.8_value=microsoft:studios/lifecycleStatus/Approved
      
      property.9_value=microsoft:studios/lifecycleStatus/UnderReview
      
      type=dam:Asset
      

      This is what returns, and notice one of the facet `/content/dam/microsoft/rad/public-campaign` has 1 count.

      If we add this facet value as one of the query condition, like this

      5_group.1_propertyvalues.0_values=/content/dam/microsoft/rad/public-campaign
      
      5_group.1_propertyvalues.extractFacet=true
      
      5_group.1_propertyvalues.property=jcr:content/metadata/msft:associatedCampaign
      
      2_group.0_path=/content/dam/microsoft/rad/
      
      2_group.p.or=true
      
      orderby=jcr:content/jcr:lastModified
      
      orderby.sort=desc
      
      p.facetStrategy=oak
      
      p.facets=true
      
      p.guessTotal=250
      
      p.limit=-1
      
      p.offset=0
      
      property=jcr:content/metadata/msft:lifecycleStatus
      
      property.10_value=microsoft:studios/lifecycleStatus/Created
      
      property.1_value=Created
      
      property.2_value=Under Review
      
      property.3_value=Rejected
      
      property.4_value=Approved
      
      property.5_value=Published
      
      property.6_value=microsoft:search-marketing/lifecycleStatus/Approved
      
      property.7_value=microsoft:search-marketing/lifecycleStatus/Created
      
      property.8_value=microsoft:studios/lifecycleStatus/Approved
      
      property.9_value=microsoft:studios/lifecycleStatus/UnderReview
      
      type=dam:Asset
      

      We got this, as you can see the actual count is 2.

      Is it an expected behavior? We are even seeing count being off on large result sets…this makes user experience pretty bad and we thought the error rate would be much lower than that

       

       

      Attachments

        1. image-2019-03-29-11-00-16-305.png
          62 kB
          Kelvin Xu
        2. image-2019-03-29-11-00-11-094.png
          62 kB
          Kelvin Xu
        3. image-2019-03-29-10-59-17-163.png
          254 kB
          Kelvin Xu
        4. image-2019-03-29-10-59-03-699.png
          254 kB
          Kelvin Xu

        Activity

          People

            Unassigned Unassigned
            kexu Kelvin Xu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: