Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8540

Multi select facets give incorrect results

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 5.3, 5.3.1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      We have a single core and use faceting to search documents. When we started to use multi select faceting we noticed, that results do not match with the real data in the core.

      For example, we make this simple query:

      q=*:*
      rows=0
      facet=true
      facet.limit=5
      facet.field=file_type
      

      Corresponding URL is http://localhost:8983/.../select?q=*%3A*&rows=0&wt=json&indent=true&facet=true&facet.limit=5&facet.field=file_type

      We get the following results:

      {
        "responseHeader": {
          "status": 0,
          "QTime": 42
        },
        "response": {
          "numFound": 1240067,
          "start": 0,
          "docs": []
        },
        "facet_counts": {
          "facet_queries": {},
          "facet_fields": {
            "file_type": [
              "5",
              1073053,
              "3",
              51078,
              "7",
              41956,
              "10",
              16121,
              "12",
              12585
            ]
          },
          "facet_dates": {},
          "facet_ranges": {},
          "facet_intervals": {},
          "facet_heatmaps": {}
        }
      }
      

      When we add a filter by file_type:

      q=*:*
      fq=file_type:3
      rows=0
      facet=true
      facet.limit=5
      facet.field=file_type
      

      Corresponding URL is http://localhost:8983/.../select?q=*%3A*&fq=file_type%3A3&rows=0&wt=json&indent=true&facet=true&facet.limit=5&facet.field=file_type

      then we get nonzero count only for the filtered value:

      {
        "responseHeader": {
          "status": 0,
          "QTime": 5
        },
        "response": {
          "numFound": 51078,
          "start": 0,
          "docs": []
        },
        "facet_counts": {
          "facet_queries": {},
          "facet_fields": {
            "file_type": [
              "3",
              51078,
              "1",
              0,
              "4",
              0,
              "5",
              0,
              "7",
              0
            ]
          },
          "facet_dates": {},
          "facet_ranges": {},
          "facet_intervals": {},
          "facet_heatmaps": {}
        }
      }
      

      But we want to have multi select faceting by file_type, so we exclude the filter from faceting:

      q=*:*
      fq={!tag=ft}file_type:3
      rows=0
      facet=true
      facet.limit=5
      facet.field={!ex=ft}file_type
      

      Corresponding URL is {{http://localhost:8983/.../select?q=*%3A*&fq=%7B!tag%3Dft%7Dfile_type%3A3&rows=0&wt=json&indent=true&facet=true&facet.limit=5&facet.field=

      {!ex=ft}

      file_type}}

      But results contain incorrect values for all available file_type values. All counts are greater than they were before we added the filter:

      {
        "responseHeader": {
          "status": 0,
          "QTime": 38
        },
        "response": {
          "numFound": 51078,
          "start": 0,
          "docs": []
        },
        "facet_counts": {
          "facet_queries": {},
          "facet_fields": {
            "file_type": [
              "5",
              1073146,
              "3",
              66705,
              "7",
              42202,
              "10",
              16903,
              "12",
              12710
            ]
          },
          "facet_dates": {},
          "facet_ranges": {},
          "facet_intervals": {},
          "facet_heatmaps": {}
        }
      }

      We expect multi select facet counts to be exactly the same as without filters. Before we added the filter by file_type we saw that there are only 51078 documents with file_type=3. But when we add the filter and exclude it from the faceting, Solr tells us about 66705 documents with file_type=3 while the numFound is still 51078.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                vasiliy.bout Vasiliy Bout
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: