Solr
  1. Solr
  2. SOLR-1556

TermVectorComponents should provide good error messages when fieldtype isn't compatible with requested options

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: search
    • Labels:
      None

      Description

      As noted by grant on the email list, asking TermVectorComponent for things like termVectors, positions, and offsets can't produce meaningful results unless the field in question has the corrisponding schema option set to true – but the behavior of TVC when they not true is confusing to users.

      We should make TVC return a meaningful error if it's asked to return a certain type of info for field that it can't deal with - something making it clear what properties of hte schema need to be changed to make it work...

      http://old.nabble.com/Re%3A-TermVectorComponent-%3A-Required---Optional-Parameters-p26181454.html

      1. SOLR-1556.patch
        26 kB
        Grant Ingersoll
      2. SOLR-1556.patch
        26 kB
        Grant Ingersoll

        Activity

        Hide
        Hoss Man added a comment -

        updating hte issue summary/description after re-reading the email thread.

        Show
        Hoss Man added a comment - updating hte issue summary/description after re-reading the email thread.
        Hide
        Hoss Man added a comment -

        skimming the code a bit, this doesn't seem easy to be a very a clean way to do this.

        TVC allows multiple fields to be specified at once, but the positions & offsets options can not be specified per field – so if the user wants termVectors for fieldA and termVectors plus position info for fieldB, there is no way for them to distinguish that they don't care about position info for fieldA – if fieldA doens't have positions enabled in the schema, making TVC error based on that request would do the user a dis-service. Even if there is no clean way to ask for it, the scema supports what they want – so we shouldn't error.

        I considerd breifly what it would take to make TVC support per-field overrides for the various options – the trickiness comes from hte fact that it doesn't iterate over the list of fields itself. It builds up a custom TermVectorMapper to pass to IndexReader.getTermFreqVector for each doc. So we'd need to modify that TermVectorMapper impl to know about the SolrQueryRequest so it could check the options per field instead of just specifying the booleans when it's constructed.

        Things are also complicated by the fact that the tv.fl param defaults to the fl param – so even if users enable termVectors, positions, and offsets for all the fields they care about, and they could starting getting errors if a new field is added to the "fl" when they don't care about any of that info.

        All told: this may be one of the situations where an actual "error" isn't in hte best interests of the user – adding a warning to the TVC output for each field/option combo that doesn't make sense is probably more useful.

        Show
        Hoss Man added a comment - skimming the code a bit, this doesn't seem easy to be a very a clean way to do this. TVC allows multiple fields to be specified at once, but the positions & offsets options can not be specified per field – so if the user wants termVectors for fieldA and termVectors plus position info for fieldB, there is no way for them to distinguish that they don't care about position info for fieldA – if fieldA doens't have positions enabled in the schema, making TVC error based on that request would do the user a dis-service. Even if there is no clean way to ask for it, the scema supports what they want – so we shouldn't error. I considerd breifly what it would take to make TVC support per-field overrides for the various options – the trickiness comes from hte fact that it doesn't iterate over the list of fields itself. It builds up a custom TermVectorMapper to pass to IndexReader.getTermFreqVector for each doc. So we'd need to modify that TermVectorMapper impl to know about the SolrQueryRequest so it could check the options per field instead of just specifying the booleans when it's constructed. Things are also complicated by the fact that the tv.fl param defaults to the fl param – so even if users enable termVectors, positions, and offsets for all the fields they care about, and they could starting getting errors if a new field is added to the "fl" when they don't care about any of that info. All told: this may be one of the situations where an actual "error" isn't in hte best interests of the user – adding a warning to the TVC output for each field/option combo that doesn't make sense is probably more useful.
        Hide
        Hoss Man added a comment -

        Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

        http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

        Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

        A unique token for finding these 240 issues in the future: hossversioncleanup20100527

        Show
        Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
        Hide
        Grant Ingersoll added a comment -

        Here's a patch, with tests, that fixes all the issues raised. See http://wiki.apache.org/solr/TermVectorComponent for more details on the new per field options and the error messages.

        Show
        Grant Ingersoll added a comment - Here's a patch, with tests, that fixes all the issues raised. See http://wiki.apache.org/solr/TermVectorComponent for more details on the new per field options and the error messages.
        Hide
        Grant Ingersoll added a comment -

        Ready to go. Going to commit this weekend.

        Show
        Grant Ingersoll added a comment - Ready to go. Going to commit this weekend.
        Hide
        Grant Ingersoll added a comment -

        Committed revision 960204 to trunk.

        Committed revision 960206.

        Show
        Grant Ingersoll added a comment - Committed revision 960204 to trunk. Committed revision 960206.
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1.0 release

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1.0 release

          People

          • Assignee:
            Grant Ingersoll
            Reporter:
            Hoss Man
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development