Solr
  1. Solr
  2. SOLR-424

The ruby output type produces incorrect output for numeric types without a value

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 1.1.0, 1.2, 1.3
    • Fix Version/s: 1.3
    • Component/s: clients - ruby - flare
    • Labels:
      None

      Description

      When parsing the Ruby output returned from Solr, if a numerical value has no value in the index, it causes an invalid Ruby hash to be returned. For instance:

       
       'response'=>{'numFound'=>1,'start'=>0,'maxScore'=>4.951244,'docs'=>[
      	{
      	 'subclass_t'=>'Protocol',
      	 'pk_i'=>1,
      	 'id'=>'Protocol:1',
      	 'name_t'=>'Falcipain IC50',
      	 'group_id_i'=>,
      	 'score'=>4.951244}]
       }}
      

      is not a valid hash because 'group_id_i' does not resolve to anything. It should resolve to nil:

       
       'response'=>{'numFound'=>1,'start'=>0,'maxScore'=>4.951244,'docs'=>[
      	{
      	 'subclass_t'=>'Protocol',
      	 'pk_i'=>1,
      	 'id'=>'Protocol:1',
      	 'name_t'=>'Falcipain IC50',
      	 'group_id_i'=>nil,
      	 'score'=>4.951244}]
       }}
      
      1. zero_length_int.patch
        0.8 kB
        Yonik Seeley
      2. TextResponseWriter-SOLR-424.patch
        0.7 kB
        Yousef Ourabi
      3. TextResponseWriter-424.java.patch
        0.7 kB
        Yousef Ourabi
      4. SOLR-424.patch
        2 kB
        Yonik Seeley
      5. fix_ruby_output.patch
        1 kB
        Kurt Schrader

        Activity

        Hide
        Kurt Schrader added a comment -

        Here is a patch for the issue. The code seems to be untested, along with all of the other output types, so I couldn't find a test case to modify to cover this issue.

        Show
        Kurt Schrader added a comment - Here is a patch for the issue. The code seems to be untested, along with all of the other output types, so I couldn't find a test case to modify to cover this issue.
        Hide
        Yonik Seeley added a comment -

        I believe the underlying issue is that there isn't validation during indexing.
        An empty string is not a valid number... if you wish to leave the number out of that document, then leave it entirely out.

        For a legacy lucene index, I think a zero length integer field should map to 0, not null (which is not an integer).

        Show
        Yonik Seeley added a comment - I believe the underlying issue is that there isn't validation during indexing. An empty string is not a valid number... if you wish to leave the number out of that document, then leave it entirely out. For a legacy lucene index, I think a zero length integer field should map to 0, not null (which is not an integer).
        Hide
        Yonik Seeley added a comment -

        Attaching patch for the output side of things.

        Show
        Yonik Seeley added a comment - Attaching patch for the output side of things.
        Hide
        Kurt Schrader added a comment -

        The problem with making it zero is that it can cause problems if zero is a valid value. You don't want to run a search for records with a value of zero there and have all of the records with no value turn up as well.

        Show
        Kurt Schrader added a comment - The problem with making it zero is that it can cause problems if zero is a valid value. You don't want to run a search for records with a value of zero there and have all of the records with no value turn up as well.
        Hide
        Yonik Seeley added a comment -

        If the field has no value, it should not be added to the index.
        Think about a field of type "string"... a zero length string is a valid value and should not be confused for an absent value.

        Show
        Yonik Seeley added a comment - If the field has no value, it should not be added to the index. Think about a field of type "string"... a zero length string is a valid value and should not be confused for an absent value.
        Hide
        Kurt Schrader added a comment -

        I understand that, but the issue here is how to deal with indexes that have a field with no value in legacy indexes.
        To me it seems more logical to return nil to indicate to the user that their field is invalid.

        Show
        Kurt Schrader added a comment - I understand that, but the issue here is how to deal with indexes that have a field with no value in legacy indexes. To me it seems more logical to return nil to indicate to the user that their field is invalid.
        Hide
        Erik Hatcher added a comment -

        I agree with Kurt - nil makes more sense than zero in this case.

        Show
        Erik Hatcher added a comment - I agree with Kurt - nil makes more sense than zero in this case.
        Hide
        Yonik Seeley added a comment -

        But we don't allow nil as an integer value when indexing (except by accident), so how can it be a valid value coming back out? The way to represent "no value" in lucene is to leave the value out.

        Show
        Yonik Seeley added a comment - But we don't allow nil as an integer value when indexing (except by accident), so how can it be a valid value coming back out? The way to represent "no value" in lucene is to leave the value out.
        Hide
        Otis Gospodnetic added a comment -

        For what it's worth, I've had 2 additional people say that returning the correct Ruby output would be better...

        Show
        Otis Gospodnetic added a comment - For what it's worth, I've had 2 additional people say that returning the correct Ruby output would be better...
        Hide
        Yousef Ourabi added a comment - - edited

        Since all response writers implement the TextResponseWriter abstract class it seems like it might make more sense to tweak writeVal (line 115)

        from: if (val==null)
        to: if (val==null || val.toString().length() == 0 || "".equals(val.toString())) {

        As Yonik has pointed out – this is just part of the problem, the other being how did it get to be empty in the first place. Perhaps breaking this up into two separate issues makes more sense?

        Attaching (trivial) patch to TextResponseWriter.

        Feedback please.

        Thanks,
        Yousef

        Show
        Yousef Ourabi added a comment - - edited Since all response writers implement the TextResponseWriter abstract class it seems like it might make more sense to tweak writeVal (line 115) from: if (val==null) to: if (val==null || val.toString().length() == 0 || "".equals(val.toString())) { As Yonik has pointed out – this is just part of the problem, the other being how did it get to be empty in the first place. Perhaps breaking this up into two separate issues makes more sense? Attaching (trivial) patch to TextResponseWriter. Feedback please. Thanks, Yousef
        Hide
        Yousef Ourabi added a comment -

        Tweak line 115 of TextResponseWriter.writeVal() to test not only for null but zero-length strings?

        Show
        Yousef Ourabi added a comment - Tweak line 115 of TextResponseWriter.writeVal() to test not only for null but zero-length strings?
        Hide
        Mike Klaas added a comment -

        Eric or Yonik, do you want to mentor this for 1.3?

        Show
        Mike Klaas added a comment - Eric or Yonik, do you want to mentor this for 1.3?
        Hide
        Yousef Ourabi added a comment -

        Slightly more correct version. Can someone please comment on the appropriateness of this patch, and what the wishes and desires are to close this bug.

        Show
        Yousef Ourabi added a comment - Slightly more correct version. Can someone please comment on the appropriateness of this patch, and what the wishes and desires are to close this bug.
        Hide
        Yonik Seeley added a comment -

        The latest patch would have a non-negligible performance impact (since it calls toString() and trim() on every object)
        and incorrect output (since it would effectively disable zero length strings for any type, including strings).

        Show
        Yonik Seeley added a comment - The latest patch would have a non-negligible performance impact (since it calls toString() and trim() on every object) and incorrect output (since it would effectively disable zero length strings for any type, including strings).
        Hide
        Yousef Ourabi added a comment -

        The other place I was thinking would be the WriteInt implementation in JsonResponseWriter – do the check there. Would that make more sense to you?

        Show
        Yousef Ourabi added a comment - The other place I was thinking would be the WriteInt implementation in JsonResponseWriter – do the check there. Would that make more sense to you?
        Hide
        Yonik Seeley added a comment -

        attaching patch that uses a "null" value for legacy int/long lucene fields only when a zero length string is encountered as the field value.

        Show
        Yonik Seeley added a comment - attaching patch that uses a "null" value for legacy int/long lucene fields only when a zero length string is encountered as the field value.
        Hide
        Yonik Seeley added a comment -

        committed.

        Show
        Yonik Seeley added a comment - committed.

          People

          • Assignee:
            Erik Hatcher
            Reporter:
            Kurt Schrader
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development