Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13829

RecursiveEvaluator casts Continuous numbers to Discrete Numbers, causing mismatch

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Resolved
    • None
    • 8.3
    • None
    • None

    Description

      In trying to use the "sort" streaming evaluator on float field (pfloat), I am getting casting errors back based upon which values are calculated based upon underlying values in a field.

      Example:

      Docs: (paste each into "Documents" pane in Solr Admin UI as type:"json")

       

      {"id": "1", "name":"donut","vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]}
      
      {"id": "2", "name":"cheese pizza","vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]}

       

      Streaming Expression:

       

      sort(select(search(food_collection, q="*:*", fl="id,vector_fs", sort="id asc"), cosineSimilarity(vector_fs, array(5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as sim, id), by="sim desc")

       

      Response:

       

      { 
        "result-set": {
          "docs": [
            {
              "EXCEPTION": "class java.lang.Double cannot be cast to class java.lang.Long (java.lang.Double and java.lang.Long are in module java.base of loader 'bootstrap')",
              "EOF": true,
              "RESPONSE_TIME": 13
            }
          ]
        }
      }

       

       

      This is because in org.apache.solr.client.solrj.io.eval.RecursiveEvaluator, there is a line which examines a numeric (BigDecimal) value and - regardless of the type of the field the value originated from - converts it to a Long if it looks like a whole number. This is the code in question from that class:

      protected Object normalizeOutputType(Object value) {
          if(null == value){
            return null;
          } else if (value instanceof VectorFunction) {
            return value;
          } else if(value instanceof BigDecimal){
            BigDecimal bd = (BigDecimal)value;
            if(bd.signum() == 0 || bd.scale() <= 0 || bd.stripTrailingZeros().scale() <= 0){
              try{
                return bd.longValueExact();
              }
              catch(ArithmeticException e){
                // value was too big for a long, so use a double which can handle scientific notation
              }
            }
            
            return bd.doubleValue();
          }
      ... [other type conversions]
      

      Because of the return bd.longValueExact(); line, the calculated value for "sim" in doc 1 is "Float(1)", whereas the calculated value for "sim" for doc 2 is "Double(0.88938313). These are coming back as incompatible data types, even though the source data is all of the same type and should be comparable.

      Thus when the sort evaluator streaming expression (and probably others) runs on these calculated values and the list should contain ["0.88938313", "1.0"], an exception is thrown because the it's trying to compare incompatible data types [Double("0.99"), Long(1)].

      This bug is occurring on master currently, but has probably existed in the codebase since at least August 2017.

      Attachments

        1. SOLR-13829.patch
          24 kB
          Joel Bernstein

        Issue Links

          Activity

            People

              Unassigned Unassigned
              solrtrey Trey Grainger
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m