Uploaded image for project: 'Stanbol (Retired)'
  1. Stanbol (Retired)
  2. STANBOL-654

The SolrYard does not correcly enclose multi word query terms in quotes

    XMLWordPrintableJSON

Details

    Description

      STANBOL-607 introduced that natural language constraints containing of multiple words are encoded using "Frankfurt am Main" instead of (Frankfurt AND am AND Main).

      However the implementation does not correctly put "quotes" around multi word tokens

      Because of that a query for the rdfs:label "Frankfurt am Main" is encoded as

      (_!@/rdfs\:label/:Frankfurt am Main)

      instead of

      (_!@/rdfs\:label/:"Frankfurt am Main")

      resulting in Solr to search for

      • "Frankfurt" in the values of rdfs:label OR
      • "am" in the full text field OR
      • "Main" in the full text field

      instead of "Frankfurt am Main" in the values of rdfs:label.

      Sadly all unit test passes because for the used DBpedia test data Solr ranking "ensures" that the wrongly encoded query has the same result as a correctly encoded one.

      However on bigger data sets with more data in the full text field this really has a big impact on query results.

      NOTE: the release 0.9.0-incubating version is NOT affected by this as this was only introduced in the trunk while working on 0.10.0!

      Attachments

        Issue Links

          Activity

            People

              rwesten Rupert Westenthaler
              rwesten Rupert Westenthaler
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: