Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8584

Phrase Fields ranking bug for range search when using edismax queryparser



    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 4.7, 4.8.1, 4.9.1, 4.10.4, 5.0, 5.1, 5.2, 5.3.1, 5.4
    • None
    • query parsers


      When you have PhraseFields defined in your edismax parser and make a range query, the edismax will expand the query to the phrase fields, but with a wrong term added to the query. That term is the max-limit from the range query.

      In the example tutorial Solr add the following to the /browse edismax handler in solrconfig.xml:

      <str name="pf">name</str> 

      Try this query with query explain for the /browse requesthandler:

      apple AND weight:[* TO 60]

      This is from the query explain:

      ..... DisjunctionMaxQuery((name:"apple 60").....

      As you can see the term 60 (from the range query) is now added to the search term and this will boost the score if indeed "apple 60" matches anything.
      To demostrate this, add the following document to the index:
      (ipod_video.xml with minor changes)

        <field name="id">MA147LL/B</field>
        <field name="name">Apple 120 GB iPod with Video Playback Black</field>
        <field name="manu">Apple Computer Inc.</field>
        <!-- Join -->
        <field name="manu_id_s">apple</field>
        <field name="cat">electronics</field>
        <field name="cat">music</field>
        <field name="features">iTunes, Podcasts, Audiobooks</field>
        <field name="features">Stores up to 15,000 songs, 25,000 photos, or 150 hours of video</field>
        <field name="features">2.5-inch, 320x240 color TFT LCD display with LED backlight</field>
        <field name="features">Up to 20 hours of battery life</field>
        <field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, H.264 video</field>
        <field name="features">Notes, Calendar, Phone book, Hold button, Date display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level indication</field>
        <field name="includes">earbud headphones, USB cable</field>
        <field name="weight">6.5</field>
        <field name="price">599.00</field>
        <field name="popularity">10</field>
        <field name="inStock">true</field>
        <!-- Dodge City store -->
        <field name="store">37.7752,-100.0232</field>
        <field name="manufacturedate_dt">2005-10-12T08:00:00Z</field>

      When you repeat the query:

      apple AND weight:[* TO 60]

      It will find two documents as expected, but the ranking should be
      identical! Instead they are 0.65656495 and 0.3007804

      The reason for this bug is that phrase "apple 60" matches one of the documents (the one that comes with the tutorial).

      The phrase field expansion can go much worse than this and use both the start-limit, end-limit and "TO" used in the range query part.

      Solution: Do not use anything from the range query part for the phrase fields.

      /Thomas Egense and Toke Eskildsen




            Unassigned Unassigned
            thomas_egense Thomas Egense
            0 Vote for this issue
            3 Start watching this issue