Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10341

SQL AVG function mis-interprets field type.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 6.5
    • Fix Version/s: None
    • Component/s: Parallel SQL
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      Using movielens data (users, movies, ratings), I tried the following SQL:

      curl --data-urlencode "stmt=SELECT solr.title as title, avg(rating) as avg_rating FROM ratings INNER JOIN (select movie_id,title from movies where _query_='plot_txt_en:love') as solr ON ratings.movie_id = solr.movie_id GROUP BY title ORDER BY avg_rating DESC LIMIT 10" "http://localhost:8983/solr/movies/sql?aggregationMode=facet"
      

      Solr returns this error:

      {"result-set":{"docs":[{"EXCEPTION":"Failed to execute sqlQuery 'SELECT solr.title as title, avg(rating) as avg_rating FROM ratings INNER JOIN (select movie_id,title from movies where _query_='plot_txt_en:love') as solr ON ratings.movie_id = solr.movie_id GROUP BY title ORDER BY avg_rating DESC LIMIT 10' against JDBC connection 'jdbc:calcitesolr:'.\nError while executing SQL \"SELECT solr.title as title, avg(rating) as avg_rating FROM ratings INNER JOIN (select movie_id,title from movies where _query_='plot_txt_en:love') as solr ON ratings.movie_id = solr.movie_id GROUP BY title ORDER BY avg_rating DESC LIMIT 10\": From line 1, column 29 to line 1, column 39: Cannot apply 'AVG' to arguments of type 'AVG(<JAVATYPE(CLASS JAVA.LANG.STRING)>)'. Supported form(s): 'AVG(<NUMERIC>)'","EOF":true,"RESPONSE_TIME":92}]}}
      

      rating is a TrieInt with docValues enabled.

      <field name="rating" type="tint" docValues="true" multiValued="false" indexed="true" stored="true"/>
      

      see screenshot

      1. Screen Shot 2017-03-22 at 8.12.33 AM.png
        121 kB
        Timothy Potter
      2. SOLR-10341.patch
        43 kB
        Joel Bernstein
      3. SOLR-10341.patch
        38 kB
        Joel Bernstein

        Activity

        Hide
        thelabdude Timothy Potter added a comment -

        rating is a numeric type

        Show
        thelabdude Timothy Potter added a comment - rating is a numeric type
        Hide
        joel.bernstein Joel Bernstein added a comment -

        Joins are not yet supported. Are you seeing the same problem with a query on a single table?

        Show
        joel.bernstein Joel Bernstein added a comment - Joins are not yet supported. Are you seeing the same problem with a query on a single table?
        Hide
        joel.bernstein Joel Bernstein added a comment - - edited

        I'm seeing a bug with avg on a single table, with casting between long and double. Looking into this now.

        Show
        joel.bernstein Joel Bernstein added a comment - - edited I'm seeing a bug with avg on a single table, with casting between long and double. Looking into this now.
        Hide
        joel.bernstein Joel Bernstein added a comment - - edited

        The bug I'm seeing occurs when you try to average an int or long field. The issue is that the Streaming Expressions are return doubles for the average function but Calcite is looking for a long value when averages are done on a long field. Averages are working on float and double fields.

        Show
        joel.bernstein Joel Bernstein added a comment - - edited The bug I'm seeing occurs when you try to average an int or long field. The issue is that the Streaming Expressions are return doubles for the average function but Calcite is looking for a long value when averages are done on a long field. Averages are working on float and double fields.
        Hide
        risdenk Kevin Risden added a comment -

        Can you work around it by casting before doing the average?

        Show
        risdenk Kevin Risden added a comment - Can you work around it by casting before doing the average?
        Hide
        joel.bernstein Joel Bernstein added a comment - - edited

        Yes, that's a work around used in the test cases, but we should fix this so it works without needing the cast.

        Show
        joel.bernstein Joel Bernstein added a comment - - edited Yes, that's a work around used in the test cases, but we should fix this so it works without needing the cast.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        This is easily fixed in MapReduce mode. But it's trickier in facet mode.

        Show
        joel.bernstein Joel Bernstein added a comment - This is easily fixed in MapReduce mode. But it's trickier in facet mode.
        Hide
        thelabdude Timothy Potter added a comment -
        Show
        thelabdude Timothy Potter added a comment - sounds like this issue -> https://issues.apache.org/jira/browse/SOLR-9372
        Hide
        joel.bernstein Joel Bernstein added a comment -

        This bug was introduced with the Calcite integration. SOLR-9372 needs to be addressed as well though.

        Show
        joel.bernstein Joel Bernstein added a comment - This bug was introduced with the Calcite integration. SOLR-9372 needs to be addressed as well though.
        Hide
        thelabdude Timothy Potter added a comment -

        my bad on the join example ... I thought they came for free with the calcite integration ... are joins planned for inclusion in 6.5?

        Show
        thelabdude Timothy Potter added a comment - my bad on the join example ... I thought they came for free with the calcite integration ... are joins planned for inclusion in 6.5?
        Hide
        risdenk Kevin Risden added a comment -

        Technically joins should work. They will just be table scans basically joined in memory. The pushdown isn't there yet (meaning streaming expression join). So could very easily blow through memory if Calcite does the join.

        Show
        risdenk Kevin Risden added a comment - Technically joins should work. They will just be table scans basically joined in memory. The pushdown isn't there yet (meaning streaming expression join). So could very easily blow through memory if Calcite does the join.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        Ok, I see a path forward with facet mode as well. I will work on getting this fixed for a bug fix release.

        Show
        joel.bernstein Joel Bernstein added a comment - Ok, I see a path forward with facet mode as well. I will work on getting this fixed for a bug fix release.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        Since we have no test cases for joins it's really hard to know how they well behave. We've pushed down so much functionality and we don't know what rules will get triggered during the joins, so its hard to know whats going to happen.

        Show
        joel.bernstein Joel Bernstein added a comment - Since we have no test cases for joins it's really hard to know how they well behave. We've pushed down so much functionality and we don't know what rules will get triggered during the joins, so its hard to know whats going to happen.
        Hide
        thelabdude Timothy Potter added a comment -

        I know the release process for 6.5 has already started but I think this should be included ... seems like any time I request an int field, I get an error about type casting.

        Show
        thelabdude Timothy Potter added a comment - I know the release process for 6.5 has already started but I think this should be included ... seems like any time I request an int field, I get an error about type casting.
        Hide
        joel.bernstein Joel Bernstein added a comment - - edited

        I've only seen the issue with the avg function. Have you seen it with any other?

        Show
        joel.bernstein Joel Bernstein added a comment - - edited I've only seen the issue with the avg function. Have you seen it with any other?
        Hide
        joel.bernstein Joel Bernstein added a comment - - edited

        For example this query is working fine:

        select year_i, month_i, count(*) as `count`, min(day_i) as min_day , max(day_i) as max_day from collection1 group by year_i,  month_i
        
        Show
        joel.bernstein Joel Bernstein added a comment - - edited For example this query is working fine: select year_i, month_i, count(*) as `count`, min(day_i) as min_day , max(day_i) as max_day from collection1 group by year_i, month_i
        Hide
        joel.bernstein Joel Bernstein added a comment -

        Also this works:

        select year_i, month_i, count(*) as `count`, sum(day_i) as sum_day , sum(month_i) as sum_month, max(price_f) as max_price from collection1 where year_i >30  group by year_i,  month_i order by sum(day_i) desc
        
        
        Show
        joel.bernstein Joel Bernstein added a comment - Also this works: select year_i, month_i, count(*) as `count`, sum(day_i) as sum_day , sum(month_i) as sum_month, max(price_f) as max_price from collection1 where year_i >30 group by year_i, month_i order by sum(day_i) desc
        Hide
        joel.bernstein Joel Bernstein added a comment -

        Here is the list of manual tests that I run in both facet and map_reduce mode, which all work. The test cases cover lots of different scenarios as well. The avg function just slipped through the cracks:

        select id from collection1 limit 10
        select id, test_s from collection1 limit 10
        select id, test_s, test_t from collection1 limit 10
        select id, test_s, test_t, year_i from collection1 limit 10
        select id, test_s, test_t, year_i, price_f from collection1 limit 10
        select id, test_s, test_t, year_i, price_f, prod_ss from collection1 limit 10
        select id, test_s, test_t, year_i, price_f, prod_ss from collection1 where year_i > 36 limit 10
        select id, test_s, test_t, year_i, price_f, prod_ss from collection1 where year_i > 36 and year_i < 38 limit 10
        select id, test_s, test_t, year_i, price_f, prod_ss from collection1 where (year_i > 36 and year_i < 38) limit 10
        select id, test_s, test_t, year_i, price_f, prod_ss from collection1 where (year_i > 36 and year_i < 37) limit 10
        select id, test_s, test_t, year_i, price_f, prod_ss from collection1 where (year_i > 36 and year_i < 37) or id = '410005' limit 10
        select id, test_s, test_t, year_i, price_f, prod_ss from collection1 where (year_i > 36 and year_i < 38) order by price_f desc limit 10
        select id, test_s, test_t, year_i, day_i, price_f, prod_ss from collection1 where (year_i > 36 and year_i < 38) order by day_i desc, price_f desc limit 10
        select id, test_s, test_t as textField, year_i, day_i, price_f, prod_ss from collection1 where (year_i > 36 and year_i < 38) order by day_i desc, price_f desc limit 10
        select id, test_s, test_t as textField, year_i, day_i as testday, price_f, prod_ss from collection1 where (year_i > 36 and year_i < 38) order by testday desc, price_f desc limit 10
        select id, test_s, test_t as textField, year_i, day_i as `day`, price_f, prod_ss from collection1 where (year_i > 36 and year_i < 38) order by `day` desc, price_f desc limit 10
        
        select count(*) from collection1 
        select count(*) as `count` from collection1 
        select count(*), min(day_i) from collection1
        select count(*), min(day_i), max(day_i) from collection1
        select count(*) as `count`, min(day_i) as min_day, max(day_i) as max_day from collection1
        select count(*) as `count`, min(day_i) as min_day, max(day_i) as max_day, sum(month_i) as sum_month from collection1 
        select distinct year_i from collection1 
        select distinct year_i from collection1 where year_i > 10 
        select distinct year_i from collection1 where year_i > 10 order by year_i desc
        select distinct year_i, month_i from collection1 where year_i > 30 order by year_i asc, month_i asc
        select distinct year_i, month_i from collection1 where year_i > 30 order by year_i asc, month_i desc
        select distinct year_i, month_i from collection1 where year_i > 30 order by year_i desc, month_i desc
        select distinct year_i, month_i, day_i from collection1 where year_i > 30 order by year_i desc, month_i desc, day_i desc
        select distinct year_i, month_i, day_i from collection1 where year_i > 30 order by year_i desc, month_i desc, day_i desc limit 2000
        select distinct year_i as `year`, month_i as `month`, day_i as `day` from collection1 where year_i > 30 order by year_i desc, month_i desc, day_i desc limit 2000
        select id, score from collection1 where test_t='(hello world)' order by score limit 500
        select year_i, count(*) from collection1 group by year_i
        select year_i, avg(month_i) from collection1 group by year_i
        select year_i, count(*) as `count` from collection1 group by year_i
        select year_i, count(*) as `count` from collection1 where year_i > 30 group by year_i
        select year_i, count(*) as `count` from collection1 where year_i > 30 group by year_i having count(*) = 200183
        select year_i, count(*) as `count` from collection1 group by year_i order by count(*) desc
        select year_i, count(*) from collection1 group by year_i order by year_i asc
        select year_i, count(*) as `count`, min(day_i) as min_day , max(day_i) as max_day from collection1 group by year_i
        select year_i, month_i, count(*) as `count`, min(day_i) as min_day , max(day_i) as max_day from collection1 group by year_i,  month_i
        select year_i, month_i, count(*) as `count`, sum(day_i) as sum_day , sum(month_i) as sum_month, max(price_f) as max_price from collection1 where year_i >30  group by year_i,  month_i order by sum(day_i) desc
        
        Show
        joel.bernstein Joel Bernstein added a comment - Here is the list of manual tests that I run in both facet and map_reduce mode, which all work. The test cases cover lots of different scenarios as well. The avg function just slipped through the cracks: select id from collection1 limit 10 select id, test_s from collection1 limit 10 select id, test_s, test_t from collection1 limit 10 select id, test_s, test_t, year_i from collection1 limit 10 select id, test_s, test_t, year_i, price_f from collection1 limit 10 select id, test_s, test_t, year_i, price_f, prod_ss from collection1 limit 10 select id, test_s, test_t, year_i, price_f, prod_ss from collection1 where year_i > 36 limit 10 select id, test_s, test_t, year_i, price_f, prod_ss from collection1 where year_i > 36 and year_i < 38 limit 10 select id, test_s, test_t, year_i, price_f, prod_ss from collection1 where (year_i > 36 and year_i < 38) limit 10 select id, test_s, test_t, year_i, price_f, prod_ss from collection1 where (year_i > 36 and year_i < 37) limit 10 select id, test_s, test_t, year_i, price_f, prod_ss from collection1 where (year_i > 36 and year_i < 37) or id = '410005' limit 10 select id, test_s, test_t, year_i, price_f, prod_ss from collection1 where (year_i > 36 and year_i < 38) order by price_f desc limit 10 select id, test_s, test_t, year_i, day_i, price_f, prod_ss from collection1 where (year_i > 36 and year_i < 38) order by day_i desc, price_f desc limit 10 select id, test_s, test_t as textField, year_i, day_i, price_f, prod_ss from collection1 where (year_i > 36 and year_i < 38) order by day_i desc, price_f desc limit 10 select id, test_s, test_t as textField, year_i, day_i as testday, price_f, prod_ss from collection1 where (year_i > 36 and year_i < 38) order by testday desc, price_f desc limit 10 select id, test_s, test_t as textField, year_i, day_i as `day`, price_f, prod_ss from collection1 where (year_i > 36 and year_i < 38) order by `day` desc, price_f desc limit 10 select count(*) from collection1 select count(*) as `count` from collection1 select count(*), min(day_i) from collection1 select count(*), min(day_i), max(day_i) from collection1 select count(*) as `count`, min(day_i) as min_day, max(day_i) as max_day from collection1 select count(*) as `count`, min(day_i) as min_day, max(day_i) as max_day, sum(month_i) as sum_month from collection1 select distinct year_i from collection1 select distinct year_i from collection1 where year_i > 10 select distinct year_i from collection1 where year_i > 10 order by year_i desc select distinct year_i, month_i from collection1 where year_i > 30 order by year_i asc, month_i asc select distinct year_i, month_i from collection1 where year_i > 30 order by year_i asc, month_i desc select distinct year_i, month_i from collection1 where year_i > 30 order by year_i desc, month_i desc select distinct year_i, month_i, day_i from collection1 where year_i > 30 order by year_i desc, month_i desc, day_i desc select distinct year_i, month_i, day_i from collection1 where year_i > 30 order by year_i desc, month_i desc, day_i desc limit 2000 select distinct year_i as `year`, month_i as `month`, day_i as `day` from collection1 where year_i > 30 order by year_i desc, month_i desc, day_i desc limit 2000 select id, score from collection1 where test_t='(hello world)' order by score limit 500 select year_i, count(*) from collection1 group by year_i select year_i, avg(month_i) from collection1 group by year_i select year_i, count(*) as `count` from collection1 group by year_i select year_i, count(*) as `count` from collection1 where year_i > 30 group by year_i select year_i, count(*) as `count` from collection1 where year_i > 30 group by year_i having count(*) = 200183 select year_i, count(*) as `count` from collection1 group by year_i order by count(*) desc select year_i, count(*) from collection1 group by year_i order by year_i asc select year_i, count(*) as `count`, min(day_i) as min_day , max(day_i) as max_day from collection1 group by year_i select year_i, month_i, count(*) as `count`, min(day_i) as min_day , max(day_i) as max_day from collection1 group by year_i, month_i select year_i, month_i, count(*) as `count`, sum(day_i) as sum_day , sum(month_i) as sum_month, max(price_f) as max_price from collection1 where year_i >30 group by year_i, month_i order by sum(day_i) desc
        Hide
        risdenk Kevin Risden added a comment -

        I'm not sure that Calcite is doing the wrong thing here. I think I remember a Calcite mailing list thing about AVG and integers. We may have been too lenient before. I'll see if I can find the Calcite message.

        Show
        risdenk Kevin Risden added a comment - I'm not sure that Calcite is doing the wrong thing here. I think I remember a Calcite mailing list thing about AVG and integers. We may have been too lenient before. I'll see if I can find the Calcite message.
        Hide
        risdenk Kevin Risden added a comment -

        Unless we are pushing down AVG and its failing on the Solr side. I could be off base with my previous message.

        Show
        risdenk Kevin Risden added a comment - Unless we are pushing down AVG and its failing on the Solr side. I could be off base with my previous message.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        I suspect Calcite is doing what is expected in SQL when averaging integers. We'll just make the adjustment to what Calcite wants.

        Show
        joel.bernstein Joel Bernstein added a comment - I suspect Calcite is doing what is expected in SQL when averaging integers. We'll just make the adjustment to what Calcite wants.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        We are pushing the down the AVG but always return a double from an average calculation. Calcite is expecting a long when the avg is done on an integer.

        Show
        joel.bernstein Joel Bernstein added a comment - We are pushing the down the AVG but always return a double from an average calculation. Calcite is expecting a long when the avg is done on an integer.
        Hide
        thelabdude Timothy Potter added a comment -

        This is failing for me:

        curl --data-urlencode "stmt=select rating, movie_id, user_id from ratings order by rating desc LIMIT 100" "http://localhost:8983/solr/ratings/sql"
        {"result-set":{"docs":[{"EXCEPTION":"java.lang.Long cannot be cast to java.lang.String","EOF":true,"RESPONSE_TIME":160}]}}
        

        rating is a tint as before ...

        2017-03-23 21:47:37.007 INFO  (qtp1197365356-15) [c:ratings s:shard1 r:core_node1 x:ratings_shard1_replica1] o.a.s.c.S.Request [ratings_shard1_replica1]  webapp=/solr path=/select params={q=*:*&distrib=false&fl=rating,movie_id,user_id&sort=rating+desc&rows=100&wt=json&version=2.2} hits=100000 status=0 QTime=19
        2017-03-23 21:47:37.014 ERROR (qtp1197365356-21) [c:ratings s:shard1 r:core_node1 x:ratings_shard1_replica1] o.a.s.c.s.i.s.ExceptionStream java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.String
        	at org.apache.calcite.avatica.util.AbstractCursor$StringAccessor.getString(AbstractCursor.java:728)
        	at org.apache.calcite.avatica.AvaticaSite.get(AvaticaSite.java:352)
        	at org.apache.calcite.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:393)
        	at org.apache.solr.client.solrj.io.stream.JDBCStream$1.selectValue(JDBCStream.java:240)
        	at org.apache.solr.client.solrj.io.stream.JDBCStream.read(JDBCStream.java:341)
        	at org.apache.solr.handler.SQLHandler$SqlHandlerStream.read(SQLHandler.java:187)
        	at org.apache.solr.client.solrj.io.stream.ExceptionStream.read(ExceptionStream.java:68)
        	at org.apache.solr.handler.StreamHandler$TimerStream.read(StreamHandler.java:508)
        	at org.apache.solr.client.solrj.io.stream.TupleStream.lambda$writeMap$0(TupleStream.java:82)
        	at org.apache.solr.response.JSONWriter.writeIterator(JSONResponseWriter.java:523)
        

        maybe this is a different issue?

        Show
        thelabdude Timothy Potter added a comment - This is failing for me: curl --data-urlencode "stmt=select rating, movie_id, user_id from ratings order by rating desc LIMIT 100" "http: //localhost:8983/solr/ratings/sql" { "result-set" :{ "docs" :[{ "EXCEPTION" : "java.lang. Long cannot be cast to java.lang. String " , "EOF" : true , "RESPONSE_TIME" :160}]}} rating is a tint as before ... 2017-03-23 21:47:37.007 INFO (qtp1197365356-15) [c:ratings s:shard1 r:core_node1 x:ratings_shard1_replica1] o.a.s.c.S.Request [ratings_shard1_replica1] webapp=/solr path=/select params={q=*:*&distrib= false &fl=rating,movie_id,user_id&sort=rating+desc&rows=100&wt=json&version=2.2} hits=100000 status=0 QTime=19 2017-03-23 21:47:37.014 ERROR (qtp1197365356-21) [c:ratings s:shard1 r:core_node1 x:ratings_shard1_replica1] o.a.s.c.s.i.s.ExceptionStream java.lang.ClassCastException: java.lang. Long cannot be cast to java.lang. String at org.apache.calcite.avatica.util.AbstractCursor$StringAccessor.getString(AbstractCursor.java:728) at org.apache.calcite.avatica.AvaticaSite.get(AvaticaSite.java:352) at org.apache.calcite.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:393) at org.apache.solr.client.solrj.io.stream.JDBCStream$1.selectValue(JDBCStream.java:240) at org.apache.solr.client.solrj.io.stream.JDBCStream.read(JDBCStream.java:341) at org.apache.solr.handler.SQLHandler$SqlHandlerStream.read(SQLHandler.java:187) at org.apache.solr.client.solrj.io.stream.ExceptionStream.read(ExceptionStream.java:68) at org.apache.solr.handler.StreamHandler$TimerStream.read(StreamHandler.java:508) at org.apache.solr.client.solrj.io.stream.TupleStream.lambda$writeMap$0(TupleStream.java:82) at org.apache.solr.response.JSONWriter.writeIterator(JSONResponseWriter.java:523) maybe this is a different issue?
        Hide
        joel.bernstein Joel Bernstein added a comment -

        What are the field types for rating, movie_id, user_id?

        Show
        joel.bernstein Joel Bernstein added a comment - What are the field types for rating, movie_id, user_id?
        Hide
        thelabdude Timothy Potter added a comment -

        rating=TrieInt, move_id=StrField and user_id=StrField ... all have docValues enabled

        Show
        thelabdude Timothy Potter added a comment - rating=TrieInt, move_id=StrField and user_id=StrField ... all have docValues enabled
        Hide
        risdenk Kevin Risden added a comment -

        I would have to look closer but TrieInt may not be mapped correctly in the Calcite schema. I think it "guesses" based on the schema right now. If you "cheat" and just cast to a Long does it return?

        Show
        risdenk Kevin Risden added a comment - I would have to look closer but TrieInt may not be mapped correctly in the Calcite schema. I think it "guesses" based on the schema right now. If you "cheat" and just cast to a Long does it return?
        Hide
        joel.bernstein Joel Bernstein added a comment -

        All the testing has been done with int rather then tint. I'll dig into this ...

        Show
        joel.bernstein Joel Bernstein added a comment - All the testing has been done with int rather then tint. I'll dig into this ...
        Hide
        joel.bernstein Joel Bernstein added a comment -

        This is the mapping logic:

        switch (luceneFieldInfo.getType()) {
                case "string":
                  type = typeFactory.createJavaType(String.class);
                  break;
                case "int":
                case "long":
                  type = typeFactory.createJavaType(Long.class);
                  break;
                case "float":
                case "double":
                  type = typeFactory.createJavaType(Double.class);
                  break;
                default:
                  type = typeFactory.createJavaType(String.class);
              }
        

        I suspect tint is not returning int or long

        Show
        joel.bernstein Joel Bernstein added a comment - This is the mapping logic: switch (luceneFieldInfo.getType()) { case "string" : type = typeFactory.createJavaType( String .class); break ; case " int " : case " long " : type = typeFactory.createJavaType( Long .class); break ; case " float " : case " double " : type = typeFactory.createJavaType( Double .class); break ; default : type = typeFactory.createJavaType( String .class); } I suspect tint is not returning int or long
        Hide
        thelabdude Timothy Potter added a comment -

        hmmm ... I guess I don't know how to cast with Calcite:

        curl --data-urlencode "stmt=select cast(rating as long) as rating, movie_id, user_id from ratings order by rating desc LIMIT 100" "http://localhost:8983/solr/ratings/sql"
        {"result-set":{"docs":[{"EXCEPTION":"Failed to execute sqlQuery 'select cast(rating as long) as rating, movie_id, user_id from ratings order by rating desc LIMIT 100' against JDBC connection 'jdbc:calcitesolr:'.\nError while executing SQL \"select cast(rating as long) as rating, movie_id, user_id from ratings order by rating desc LIMIT 100\": From line 1, column 23 to line 1, column 26: Unknown datatype name 'long'","EOF":true,"RESPONSE_TIME":491}]}}
        
        Show
        thelabdude Timothy Potter added a comment - hmmm ... I guess I don't know how to cast with Calcite: curl --data-urlencode "stmt=select cast (rating as long ) as rating, movie_id, user_id from ratings order by rating desc LIMIT 100" "http: //localhost:8983/solr/ratings/sql" { "result-set" :{ "docs" :[{ "EXCEPTION" : "Failed to execute sqlQuery 'select cast (rating as long ) as rating, movie_id, user_id from ratings order by rating desc LIMIT 100' against JDBC connection 'jdbc:calcitesolr:'.\nError while executing SQL \" select cast (rating as long ) as rating, movie_id, user_id from ratings order by rating desc LIMIT 100\ ": From line 1, column 23 to line 1, column 26: Unknown datatype name ' long '" , "EOF" : true , "RESPONSE_TIME" :491}]}}
        Hide
        risdenk Kevin Risden added a comment -

        https://calcite.apache.org/docs/reference.html#data-types You probably want numeric? long isn't a SQL type.

        Show
        risdenk Kevin Risden added a comment - https://calcite.apache.org/docs/reference.html#data-types You probably want numeric? long isn't a SQL type.
        Hide
        risdenk Kevin Risden added a comment -

        Joel Bernstein When the mapping was implemented I thought about tint and others but there was no way to know if a type was except for the primitives. The reason being is the name tint doesn't HAVE to mean an int. Now we could add the mappings and have the assumption but it was done originally with things that I knew would be true.

        Show
        risdenk Kevin Risden added a comment - Joel Bernstein When the mapping was implemented I thought about tint and others but there was no way to know if a type was except for the primitives. The reason being is the name tint doesn't HAVE to mean an int. Now we could add the mappings and have the assumption but it was done originally with things that I knew would be true.
        Hide
        thelabdude Timothy Potter added a comment -

        yep, sorry, cast(rating as bigint) worked

        Show
        thelabdude Timothy Potter added a comment - yep, sorry, cast(rating as bigint) worked
        Hide
        risdenk Kevin Risden added a comment -

        Timothy Potter No worries SQL is a pain I agree right now the out of the box experience is going to be rough (except for primitive types). As long as we know that casting works that might be a decent work around for the short term? The error messages at least look good. We do need to add tests though.

        Show
        risdenk Kevin Risden added a comment - Timothy Potter No worries SQL is a pain I agree right now the out of the box experience is going to be rough (except for primitive types). As long as we know that casting works that might be a decent work around for the short term? The error messages at least look good. We do need to add tests though.
        Hide
        risdenk Kevin Risden added a comment -

        One thought is we might be able to add the cast programmatically after querying? We might have better info at that point than just the type name from the schema? That could be rough though since wouldn't know if 1.0 or abc is coming next and we could cast wrong.

        Show
        risdenk Kevin Risden added a comment - One thought is we might be able to add the cast programmatically after querying? We might have better info at that point than just the type name from the schema? That could be rough though since wouldn't know if 1.0 or abc is coming next and we could cast wrong.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        Let's just add the mappings and we can make int, tint etc... requirements for the SQL schema.

        I'll fix both the AVG issue and the tint issue tomorrow and get everything backported for a 6.5.1 release.

        I'll be traveling starting Sunday though so I won't be able to manage the 6.5.1 release.

        Timothy Potter or Kevin Risden if one of you can manage the 6.5.1 release we can get these issues resolved next week.

        Show
        joel.bernstein Joel Bernstein added a comment - Let's just add the mappings and we can make int, tint etc... requirements for the SQL schema. I'll fix both the AVG issue and the tint issue tomorrow and get everything backported for a 6.5.1 release. I'll be traveling starting Sunday though so I won't be able to manage the 6.5.1 release. Timothy Potter or Kevin Risden if one of you can manage the 6.5.1 release we can get these issues resolved next week.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        I'll be fixing SOLR-9372 as part of this ticket also.

        Show
        joel.bernstein Joel Bernstein added a comment - I'll be fixing SOLR-9372 as part of this ticket also.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        Patch which addresses all the issues discussed in this ticket.

        Show
        joel.bernstein Joel Bernstein added a comment - Patch which addresses all the issues discussed in this ticket.
        Hide
        joel.bernstein Joel Bernstein added a comment - - edited

        The patch that was just added does the following:

        1) It aligns the facet and map_reduce aggregation outputs fixing SOLR-9372. Now both approaches output a long when sum, min, max are applied to long fields.

        2) The avg function now works on int fields. The avg function by default always returns a rounded long when averaging an int field. You can cast to return a double when averaging a long field:

        select str_s, count(*), sum(field_i), min(field_i), max(field_i), 
                  cast(avg(1.0 * field_i) as float) 
                  from collection1 where text='XXXX' group by str_s 
                  order by sum(field_i) asc limit 2
        

        When averaging a float field a double is always returned.

        3) Added aggregation support for tint, tlong and tdouble fields.

        Show
        joel.bernstein Joel Bernstein added a comment - - edited The patch that was just added does the following: 1) It aligns the facet and map_reduce aggregation outputs fixing SOLR-9372 . Now both approaches output a long when sum, min, max are applied to long fields. 2) The avg function now works on int fields. The avg function by default always returns a rounded long when averaging an int field. You can cast to return a double when averaging a long field: select str_s, count(*), sum(field_i), min(field_i), max(field_i), cast (avg(1.0 * field_i) as float ) from collection1 where text='XXXX' group by str_s order by sum(field_i) asc limit 2 When averaging a float field a double is always returned. 3) Added aggregation support for tint, tlong and tdouble fields.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        I'll do some manual testing before committing

        Show
        joel.bernstein Joel Bernstein added a comment - I'll do some manual testing before committing
        Hide
        joel.bernstein Joel Bernstein added a comment -

        I ran into some issues during the manual testing. The problem was that in certain scenarios the cast was causing the project rule to fire twice. This was not properly accounted for in the code so the second project could step on the field mappings for the aggregate functions. This took some time to figure out and resolve. The approach taken was to add logic that said that the project rule can never overwrite an existing field mapping. This seemed to resolve the issue. I'll post a new patch tomorrow and if all looks good I'll commit tomorrow as well.

        Show
        joel.bernstein Joel Bernstein added a comment - I ran into some issues during the manual testing. The problem was that in certain scenarios the cast was causing the project rule to fire twice. This was not properly accounted for in the code so the second project could step on the field mappings for the aggregate functions. This took some time to figure out and resolve. The approach taken was to add logic that said that the project rule can never overwrite an existing field mapping. This seemed to resolve the issue. I'll post a new patch tomorrow and if all looks good I'll commit tomorrow as well.
        Hide
        thelabdude Timothy Potter added a comment -

        Hi Joel Bernstein, sounds good ... I'll plan to test out the patch tomorrow.

        Show
        thelabdude Timothy Potter added a comment - Hi Joel Bernstein , sounds good ... I'll plan to test out the patch tomorrow.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        Latest patch which resolves the issues found when casting.

        I plan on testing more today and if all looks good committing.

        Show
        joel.bernstein Joel Bernstein added a comment - Latest patch which resolves the issues found when casting . I plan on testing more today and if all looks good committing.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit aa2b46a62a52c0d0117312add2a667bf6b14a709 in lucene-solr's branch refs/heads/master from Joel Bernstein
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=aa2b46a ]

        SOLR-10341: SQL AVG function mis-interprets field type

        Show
        jira-bot ASF subversion and git services added a comment - Commit aa2b46a62a52c0d0117312add2a667bf6b14a709 in lucene-solr's branch refs/heads/master from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=aa2b46a ] SOLR-10341 : SQL AVG function mis-interprets field type
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit e6b4d25289a240ff64eaeb858c4c06737999ee11 in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e6b4d25 ]

        SOLR-10341: SQL AVG function mis-interprets field type

        Show
        jira-bot ASF subversion and git services added a comment - Commit e6b4d25289a240ff64eaeb858c4c06737999ee11 in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e6b4d25 ] SOLR-10341 : SQL AVG function mis-interprets field type
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 4c979b84e8d5bd3eb4cc34f90834cedbf2a374ed in lucene-solr's branch refs/heads/branch_6_5 from Joel Bernstein
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4c979b8 ]

        SOLR-10341: SQL AVG function mis-interprets field type

        Show
        jira-bot ASF subversion and git services added a comment - Commit 4c979b84e8d5bd3eb4cc34f90834cedbf2a374ed in lucene-solr's branch refs/heads/branch_6_5 from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4c979b8 ] SOLR-10341 : SQL AVG function mis-interprets field type
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit c05ab96dc4454c00779aad2cd6989812c9b13a2e in lucene-solr's branch refs/heads/master from Joel Bernstein
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c05ab96 ]

        SOLR-10341, SOLR-10444: Update CHANGES.txt

        Show
        jira-bot ASF subversion and git services added a comment - Commit c05ab96dc4454c00779aad2cd6989812c9b13a2e in lucene-solr's branch refs/heads/master from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c05ab96 ] SOLR-10341 , SOLR-10444 : Update CHANGES.txt
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit d20d95163b2f995674c62e14a8d8792c48bc51d8 in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d20d951 ]

        SOLR-10341, SOLR-10444: Update CHANGES.txt

        Show
        jira-bot ASF subversion and git services added a comment - Commit d20d95163b2f995674c62e14a8d8792c48bc51d8 in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d20d951 ] SOLR-10341 , SOLR-10444 : Update CHANGES.txt
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit bc16deb60603b135de659db577ecc4430d9a428d in lucene-solr's branch refs/heads/branch_6_5 from Joel Bernstein
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bc16deb ]

        SOLR-10341, SOLR-10444: Update CHANGES.txt

        Show
        jira-bot ASF subversion and git services added a comment - Commit bc16deb60603b135de659db577ecc4430d9a428d in lucene-solr's branch refs/heads/branch_6_5 from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bc16deb ] SOLR-10341 , SOLR-10444 : Update CHANGES.txt

          People

          • Assignee:
            Unassigned
            Reporter:
            thelabdude Timothy Potter
          • Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development