Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8593

Integrate Apache Calcite into the SQLHandler

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Resolved
    • Affects Version/s: None
    • Fix Version/s: 6.5, master (7.0)
    • Component/s: Parallel SQL
    • Labels:
      None

      Description

      The Presto SQL Parser was perfect for phase one of the SQLHandler. It was nicely split off from the larger Presto project and it did everything that was needed for the initial implementation.

      Phase two of the SQL work though will require an optimizer. Here is where Apache Calcite comes into play. It has a battle tested cost based optimizer and has been integrated into Apache Drill and Hive.

      This work can begin in trunk following the 6.0 release. The final query plans will continue to be translated to Streaming API objects (TupleStreams), so continued work on the JDBC driver should plug in nicely with the Calcite work.

      1. SOLR-8593.patch
        329 kB
        Kevin Risden
      2. SOLR-8593.patch
        43 kB
        Joel Bernstein
      3. SOLR-8593.patch
        256 kB
        Cao Manh Dat

        Issue Links

          Activity

          Hide
          joel.bernstein Joel Bernstein added a comment -

          Linking to the JDBC driver work. The JDBC driver and the Calcite integration are connected because both are going to require hooks into the SQL Catalog. So work on the JDBC driver will inform work coming soon on the Calcite integration.

          Show
          joel.bernstein Joel Bernstein added a comment - Linking to the JDBC driver work. The JDBC driver and the Calcite integration are connected because both are going to require hooks into the SQL Catalog. So work on the JDBC driver will inform work coming soon on the Calcite integration.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          SOLR-7560 has a patch that uses Apache Calcite rather then Presto to parse the SQL queries. This would be a good place to start working from. This patch ignores hooks into Calcite's optimizer which needs access to a SQL catalog. These hooks will need to be investigated and the SQL Catalog will have to be built out. This will begin to overlap with work being done on the JDBC driver pretty much immediately.

          Show
          joel.bernstein Joel Bernstein added a comment - SOLR-7560 has a patch that uses Apache Calcite rather then Presto to parse the SQL queries. This would be a good place to start working from. This patch ignores hooks into Calcite's optimizer which needs access to a SQL catalog. These hooks will need to be investigated and the SQL Catalog will have to be built out. This will begin to overlap with work being done on the JDBC driver pretty much immediately.
          Hide
          risdenk Kevin Risden added a comment -

          Speaking of Calcite and JDBC - https://calcite.apache.org/avatica/. It was recently split out as a subproject of Calcite and is an implementation of jdbc client/server basically. It might not be far enough along currently to integrate but it is a thought.

          Show
          risdenk Kevin Risden added a comment - Speaking of Calcite and JDBC - https://calcite.apache.org/avatica/ . It was recently split out as a subproject of Calcite and is an implementation of jdbc client/server basically. It might not be far enough along currently to integrate but it is a thought.
          Hide
          risdenk Kevin Risden added a comment -

          Joel Bernstein - I took a much closer look at Calcite and the SOLR-7560 patch. I think there can be a much simpler approach than implementing the parsing and catalog separately. If a Calcite adaptor is built that enables Calcite to access Solr this would solve a lot of problems at once potentially. Calcite has a local JDBC connection which you can just pass in a SQL query and get back a result set. This local JDBC connection could be wrapped in a stream and returned from the /sql handler. This would remove the need for the limit/having/etc streams that currently exist. The logic for pushing down having/limit/etc would have to be moved into how Calcite does the processing. This should make the SQL handler much simpler since it would pass off much of the work its doing to Calcite. The downside to this approach is that much of the logic would be Calcite specific and need to figure out exactly which hooks to tie into.

          Show
          risdenk Kevin Risden added a comment - Joel Bernstein - I took a much closer look at Calcite and the SOLR-7560 patch. I think there can be a much simpler approach than implementing the parsing and catalog separately. If a Calcite adaptor is built that enables Calcite to access Solr this would solve a lot of problems at once potentially. Calcite has a local JDBC connection which you can just pass in a SQL query and get back a result set. This local JDBC connection could be wrapped in a stream and returned from the /sql handler. This would remove the need for the limit/having/etc streams that currently exist. The logic for pushing down having/limit/etc would have to be moved into how Calcite does the processing. This should make the SQL handler much simpler since it would pass off much of the work its doing to Calcite. The downside to this approach is that much of the logic would be Calcite specific and need to figure out exactly which hooks to tie into.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          If wrapping this logic up inside a Calcite adapter makes things easier then I'm all for it. But I get worried that we'll be roped into how Calcite does things. For example Solr has shuffling capabilities that will make for some some very fast Joins. Will we be able to access these features through a calcite adapter or will we have to use calcites join techniques.

          I'll read up on creating a Calcite adapter.

          Show
          joel.bernstein Joel Bernstein added a comment - If wrapping this logic up inside a Calcite adapter makes things easier then I'm all for it. But I get worried that we'll be roped into how Calcite does things. For example Solr has shuffling capabilities that will make for some some very fast Joins. Will we be able to access these features through a calcite adapter or will we have to use calcites join techniques. I'll read up on creating a Calcite adapter.
          Hide
          risdenk Kevin Risden added a comment -

          But I get worried that we'll be roped into how Calcite does things.

          This is something that I was worried about as well. I'm still trying to figure out how Calcite would best fit together.

          Show
          risdenk Kevin Risden added a comment - But I get worried that we'll be roped into how Calcite does things. This is something that I was worried about as well. I'm still trying to figure out how Calcite would best fit together.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          It's still worth looking into. But I suspect Calcite has to implement a lowest common denominator type of join to support joining lot's of different systems together.

          Since Solr is not a meta engine we can get really specific and use Solr's unique shuffling capabilities to do faster joins.

          Dennis Gove, can speak to this better then I can, but the Streaming API's distributed joins are really fast.

          Show
          joel.bernstein Joel Bernstein added a comment - It's still worth looking into. But I suspect Calcite has to implement a lowest common denominator type of join to support joining lot's of different systems together. Since Solr is not a meta engine we can get really specific and use Solr's unique shuffling capabilities to do faster joins. Dennis Gove , can speak to this better then I can, but the Streaming API's distributed joins are really fast.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Kevin Risden and I have been looking into different approaches for this ticket.

          One of the approaches is to embed the Calcite SQL parser and optimizer inside the SQLHandler. The entry point for this appears to be:

          https://calcite.apache.org/apidocs/org/apache/calcite/tools/Planner.html

          Using this approach we would need to implement two things:

          1) A CatalogReader, which the calcite validator and optimizer will use to do it's job. The underlying implementation of this should work for the JDBC driver also, so we kill two big birds with one stone when this is implemented.

          2) A custom RelVisitor, which will rewrite the relational algebra tree (RelNode), created by the optimizer. The RelNode tree will need to be mapped to the Streaming API. Since the Streaming API already supports parallel relational algebra this should be fairly straight forward.

          This approach would leave the Solr JDBC driver basically as it is, but provide all the hooks needed to finish off the remaining Catalog metadata methods.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Kevin Risden and I have been looking into different approaches for this ticket. One of the approaches is to embed the Calcite SQL parser and optimizer inside the SQLHandler. The entry point for this appears to be: https://calcite.apache.org/apidocs/org/apache/calcite/tools/Planner.html Using this approach we would need to implement two things: 1) A CatalogReader, which the calcite validator and optimizer will use to do it's job. The underlying implementation of this should work for the JDBC driver also, so we kill two big birds with one stone when this is implemented. 2) A custom RelVisitor, which will rewrite the relational algebra tree (RelNode), created by the optimizer. The RelNode tree will need to be mapped to the Streaming API. Since the Streaming API already supports parallel relational algebra this should be fairly straight forward. This approach would leave the Solr JDBC driver basically as it is, but provide all the hooks needed to finish off the remaining Catalog metadata methods.
          Hide
          risdenk Kevin Risden added a comment - - edited

          Here is a separate approach that uses all of Calcite and the JDBCStream: https://github.com/risdenk/lucene-solr/compare/master...risdenk:calcite-sql-handler

          It removes all the custom processing from SQLHandler, wraps Calcite in a JDBCStream, and executes the query.

          There is something I learned about TestSQLHandler that I'm not sure is correct:

          • quoted identifiers - like 'id' and 'text' aren't valid? These shouldn't be referring to columns?

          Things to be explored with this approach:

          • switch from a standard query in SolrEnumerator to a stream
          • fix data types
          • optimize cases like where
          • code cleanup since it was just thrown together as a POC
          Show
          risdenk Kevin Risden added a comment - - edited Here is a separate approach that uses all of Calcite and the JDBCStream: https://github.com/risdenk/lucene-solr/compare/master...risdenk:calcite-sql-handler It removes all the custom processing from SQLHandler, wraps Calcite in a JDBCStream, and executes the query. There is something I learned about TestSQLHandler that I'm not sure is correct: quoted identifiers - like 'id' and 'text' aren't valid? These shouldn't be referring to columns? Things to be explored with this approach: switch from a standard query in SolrEnumerator to a stream fix data types optimize cases like where code cleanup since it was just thrown together as a POC
          Hide
          risdenk Kevin Risden added a comment -

          Joel Bernstein I was able to make some really good progress on the above linked piece. Its coming together just need to add the optimizations now.

          I have been testing with https://github.com/risdenk/solr-calcite-example which is the same Calcite code isolated from Solr. It enables me to easily check the explain plan and data types. I'll probably bang on it some more tomorrow to integrate the push downs for project, filter, and sort. The hooks are all there just need to convert from the Cassandra example code to Solr syntax.

          Show
          risdenk Kevin Risden added a comment - Joel Bernstein I was able to make some really good progress on the above linked piece. Its coming together just need to add the optimizations now. I have been testing with https://github.com/risdenk/solr-calcite-example which is the same Calcite code isolated from Solr. It enables me to easily check the explain plan and data types. I'll probably bang on it some more tomorrow to integrate the push downs for project, filter, and sort. The hooks are all there just need to convert from the Cassandra example code to Solr syntax.
          Hide
          dpgove Dennis Gove added a comment -

          Kevin Risden, I think I must be missing something in the diff link but I don't see any use of the JDBCStream. I'm not sure I understand how the use of the JDBCStream will help with this.

          Show
          dpgove Dennis Gove added a comment - Kevin Risden , I think I must be missing something in the diff link but I don't see any use of the JDBCStream. I'm not sure I understand how the use of the JDBCStream will help with this.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          I looked through the code and I'm seeing how CloudSolrStream is being used. But it's not clear to me we'll be able to implement the full range of capabilities through this approach.

          For example:

          1) Can we choose between the FacetStream and a parallel RollupStream based on the costs of the different approaches?

          2) Can we do parallel joins using Solr's shuffling capabilities and Solr workers?

          Show
          joel.bernstein Joel Bernstein added a comment - - edited I looked through the code and I'm seeing how CloudSolrStream is being used. But it's not clear to me we'll be able to implement the full range of capabilities through this approach. For example: 1) Can we choose between the FacetStream and a parallel RollupStream based on the costs of the different approaches? 2) Can we do parallel joins using Solr's shuffling capabilities and Solr workers?
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Currently quoted identifiers do refer to columns. This was originally done because Presto didn't support mixed case columns unless they were quoted. But Presto fixed that problem. So the quoted identifiers as they are now don't really serve a purpose. But I do believe that both Presto and Calcite allow columns with quoted identifiers to support non parseable identifiers.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Currently quoted identifiers do refer to columns. This was originally done because Presto didn't support mixed case columns unless they were quoted. But Presto fixed that problem. So the quoted identifiers as they are now don't really serve a purpose. But I do believe that both Presto and Calcite allow columns with quoted identifiers to support non parseable identifiers.
          Hide
          risdenk Kevin Risden added a comment -

          The diff is misleading since SQLHandler is not shown due to having too big of a diff :/ I removed most of SQLHandler and then just wrapped Calcite in a JDBCStream. The SQLHandler class is viewable here: https://github.com/risdenk/lucene-solr/blob/calcite-sql-handler/solr/core/src/java/org/apache/solr/handler/SQLHandler.java

          Show
          risdenk Kevin Risden added a comment - The diff is misleading since SQLHandler is not shown due to having too big of a diff :/ I removed most of SQLHandler and then just wrapped Calcite in a JDBCStream. The SQLHandler class is viewable here: https://github.com/risdenk/lucene-solr/blob/calcite-sql-handler/solr/core/src/java/org/apache/solr/handler/SQLHandler.java
          Hide
          risdenk Kevin Risden added a comment -

          I think my statement was misleading. I think the issue is the single quote quoted identifiers. Single quotes from the research I did are always used to identify string constants. Back ticks, double quotes, and even brackets can be utilized for quoting identifiers. The quoting options for Calcite seem to be available under quoting here: https://calcite.apache.org/docs/adapter.html#drivers

          Show
          risdenk Kevin Risden added a comment - I think my statement was misleading. I think the issue is the single quote quoted identifiers. Single quotes from the research I did are always used to identify string constants. Back ticks, double quotes, and even brackets can be utilized for quoting identifiers. The quoting options for Calcite seem to be available under quoting here: https://calcite.apache.org/docs/adapter.html#drivers
          Hide
          risdenk Kevin Risden added a comment -

          I think we can. I will look into it today and see how far I get with some of the optimization rules.

          Show
          risdenk Kevin Risden added a comment - I think we can. I will look into it today and see how far I get with some of the optimization rules.
          Hide
          dpgove Dennis Gove added a comment -

          Ok, I see that. So the JDBCStream is being used as the bridge between standard solr handlers and calcite. But how how calcite turning the SQL statement into a stream pipeline?

          Show
          dpgove Dennis Gove added a comment - Ok, I see that. So the JDBCStream is being used as the bridge between standard solr handlers and calcite. But how how calcite turning the SQL statement into a stream pipeline?
          Hide
          risdenk Kevin Risden added a comment -

          One way to access Calcite is through JDBC. The JDBCStream is taking the resultset generated by Calcite and turning it back into a stream.

          Inside Calcite, the SQL statement is being converted into a Stream after going through some optimization rules.

          Show
          risdenk Kevin Risden added a comment - One way to access Calcite is through JDBC. The JDBCStream is taking the resultset generated by Calcite and turning it back into a stream. Inside Calcite, the SQL statement is being converted into a Stream after going through some optimization rules.
          Show
          risdenk Kevin Risden added a comment - For reference the class that is querying with a stream is here: https://github.com/risdenk/lucene-solr/blob/calcite-sql-handler/solr/core/src/java/org/apache/solr/handler/sql/SolrTable.java#L82
          Hide
          risdenk Kevin Risden added a comment - - edited

          Another thing I learned from looking closely at the TestSQLHandler tests is that collection name is case insensitive with the current implementation. This seems wrong to me because collections are case sensitive. This is tested in the testMixedCaseFields method.

          Show
          risdenk Kevin Risden added a comment - - edited Another thing I learned from looking closely at the TestSQLHandler tests is that collection name is case insensitive with the current implementation. This seems wrong to me because collections are case sensitive. This is tested in the testMixedCaseFields method.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Yeah this was done originally because Presto didn't support mixed case indentifiers, so I solved that problem by making the table name case insensitive. But this also matches up with the SQL spec which is case insensitive for indentifiers.

          Show
          joel.bernstein Joel Bernstein added a comment - Yeah this was done originally because Presto didn't support mixed case indentifiers, so I solved that problem by making the table name case insensitive. But this also matches up with the SQL spec which is case insensitive for indentifiers.
          Hide
          risdenk Kevin Risden added a comment - - edited

          Pushed initial version to branch jira/solr-8593. Currently there are a lot of tests in TestSQLHandler that are commented out. Need to go back through and make sure they have the expected results. Also some file formatting related to headers needs to be addressed.

          Show
          risdenk Kevin Risden added a comment - - edited Pushed initial version to branch jira/solr-8593. Currently there are a lot of tests in TestSQLHandler that are commented out. Need to go back through and make sure they have the expected results. Also some file formatting related to headers needs to be addressed.
          Hide
          risdenk Kevin Risden added a comment - - edited

          Ok made a bunch of progress on the jira/solr-8593 branch:

          • cleaned up the tests so that there are only a few remaining items to address (outlined below)
          • added support for float/double types
          • fixed a CloudSolrClient resource leak

          Left to do:

          • Add support for aggregationMode (facets and map_reduce) and their parameters
          • ensure the pushdown to facets/map_reduce works correctly
          • figure out the CloudSolrClient cache (currently not caching and creating new per stream)
          • Push down aggregates to Solr
          • add tests to ensure the proper plan is being generated by Calcite
          • figure out avg(int) problem in tests.
            • avg(int) returns int by design. need to figure out if casting is right for the tests
          • figure out sort asc by default in tests
            • this currently doesn't sort properly even though I thought that the right approach was sort on version.
          • handle added dependencies properly and upgrade to latest Calcite/Avatica
          Show
          risdenk Kevin Risden added a comment - - edited Ok made a bunch of progress on the jira/solr-8593 branch: cleaned up the tests so that there are only a few remaining items to address (outlined below) added support for float/double types fixed a CloudSolrClient resource leak Left to do: Add support for aggregationMode (facets and map_reduce) and their parameters ensure the pushdown to facets/map_reduce works correctly figure out the CloudSolrClient cache (currently not caching and creating new per stream) Push down aggregates to Solr add tests to ensure the proper plan is being generated by Calcite figure out avg(int) problem in tests. avg(int) returns int by design. need to figure out if casting is right for the tests figure out sort asc by default in tests this currently doesn't sort properly even though I thought that the right approach was sort on version . handle added dependencies properly and upgrade to latest Calcite/Avatica
          Hide
          risdenk Kevin Risden added a comment -

          Currently this is working out really nicely. Need to address the above items but currently can even hook up Avatica server and use the Avatica client to connect. This would resolve most of the issues in SOLR-8659

          Show
          risdenk Kevin Risden added a comment - Currently this is working out really nicely. Need to address the above items but currently can even hook up Avatica server and use the Avatica client to connect. This would resolve most of the issues in SOLR-8659
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          This is awesome.

          I like the idea of making this a 6.2 priority. I should have some time over the next couple of days to dig into the implementation.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited This is awesome. I like the idea of making this a 6.2 priority. I should have some time over the next couple of days to dig into the implementation.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Kevin Risden, I've been reviewing the work on https://github.com/risdenk/solr-calcite-example.

          I think I understand the basics of how this works but there are some mysteries as to how exactly the rules get triggered and where to place rules like join push downs.

          Do you know of an existing adapter that does join push downs that can be used as a reference implementation?

          And what's the best source of documentation for implementing rules? I haven't found anything that I would describes as comprehensive.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Kevin Risden , I've been reviewing the work on https://github.com/risdenk/solr-calcite-example . I think I understand the basics of how this works but there are some mysteries as to how exactly the rules get triggered and where to place rules like join push downs. Do you know of an existing adapter that does join push downs that can be used as a reference implementation? And what's the best source of documentation for implementing rules? I haven't found anything that I would describes as comprehensive.
          Hide
          risdenk Kevin Risden added a comment -

          Do you know of an existing adapter that does join push downs that can be used as a reference implementation?

          And what's the best source of documentation for implementing rules? I haven't found anything that I would describes as comprehensive.

          I don't have one specifically for joins. The best reference for implementing rules is the example adapters for Cassandra, MongoDB, and Splunk in the Calcite source. I wish there was some more comprehensive documentation. It has been a lot of trial and error to see how things all fit together.

          Basically the pattern seems to be as follows:
          1) Add a rule to SolrRules
          2) The constructor specifies how it matches (project, filter, etc)
          3) Add a class that implements the translation of that rule

          This all eventually gets passed down to the SolrTable where the query is built and run. I've been thinking about changing since so that the query is built along the way but not sure thats possible.

          Show
          risdenk Kevin Risden added a comment - Do you know of an existing adapter that does join push downs that can be used as a reference implementation? And what's the best source of documentation for implementing rules? I haven't found anything that I would describes as comprehensive. I don't have one specifically for joins. The best reference for implementing rules is the example adapters for Cassandra, MongoDB, and Splunk in the Calcite source. I wish there was some more comprehensive documentation. It has been a lot of trial and error to see how things all fit together. Basically the pattern seems to be as follows: 1) Add a rule to SolrRules 2) The constructor specifies how it matches (project, filter, etc) 3) Add a class that implements the translation of that rule This all eventually gets passed down to the SolrTable where the query is built and run. I've been thinking about changing since so that the query is built along the way but not sure thats possible.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Are you concerned that there isn't a reference implementation for join push downs?

          Projects such as Apache Drill must have taken the approach of only embedding the parser/optimizer?

          Show
          joel.bernstein Joel Bernstein added a comment - Are you concerned that there isn't a reference implementation for join push downs? Projects such as Apache Drill must have taken the approach of only embedding the parser/optimizer?
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          I think my main concern is that the join pushdown rules haven't been exercised that much in production. Also do we have access to the SQL catalog and statistics from inside the rules engine. We'll need that information to decide which type of join to do. I guess we can make the catalog globally accessible if the rules API doesn't provide hooks into it.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited I think my main concern is that the join pushdown rules haven't been exercised that much in production. Also do we have access to the SQL catalog and statistics from inside the rules engine. We'll need that information to decide which type of join to do. I guess we can make the catalog globally accessible if the rules API doesn't provide hooks into it.
          Hide
          risdenk Kevin Risden added a comment -

          Here is a blog post about the concepts and example rules: https://datapsyche.wordpress.com/2014/08/06/optiq-query-push-down-concepts/

          I am pretty sure that Hive uses Calcite internally and that the implementation would be in that code base. Drill might also have the same join logic in their code base.

          Also do we have access to the SQL catalog and statistics from inside the rules engine. We'll need that information to decide which type of join to do. I guess we can make the catalog globally accessible if the rules API doesn't provide hooks into it.

          There is a computeSelfCost method that has access to the definition of:

          public RelOptCost computeSelfCost(RelOptPlanner planner, RelMetadataQuery mq);
          

          That should allow us to change which rules are fired based on different cost parameters.

          Also, the different rules can have different prerequisite states such that a filter is only allowed on a project or something similar. This should allow fine grained control of a join as well where the join will only fire if it is based on the sort of the same columns.

          Show
          risdenk Kevin Risden added a comment - Here is a blog post about the concepts and example rules: https://datapsyche.wordpress.com/2014/08/06/optiq-query-push-down-concepts/ I am pretty sure that Hive uses Calcite internally and that the implementation would be in that code base. Drill might also have the same join logic in their code base. Also do we have access to the SQL catalog and statistics from inside the rules engine. We'll need that information to decide which type of join to do. I guess we can make the catalog globally accessible if the rules API doesn't provide hooks into it. There is a computeSelfCost method that has access to the definition of: public RelOptCost computeSelfCost(RelOptPlanner planner, RelMetadataQuery mq); That should allow us to change which rules are fired based on different cost parameters. Also, the different rules can have different prerequisite states such that a filter is only allowed on a project or something similar. This should allow fine grained control of a join as well where the join will only fire if it is based on the sort of the same columns.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Ok, I'll keep digging into the rules to get a better understanding.

          Show
          joel.bernstein Joel Bernstein added a comment - Ok, I'll keep digging into the rules to get a better understanding.
          Hide
          risdenk Kevin Risden added a comment -

          Joel Bernstein - Elasticsearch with Calcite just came through the Calcite dev list - CALCITE-1253

          Show
          risdenk Kevin Risden added a comment - Joel Bernstein - Elasticsearch with Calcite just came through the Calcite dev list - CALCITE-1253
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Interesting. I'd like to spend some time in the next couple weeks to see what we can done in the near term on this ticket. Possibly, the first step is just to release equivalent functionality to the current Presto code, with a Calcite release. This would provide the base to gradually expand the SQL feature set.

          Show
          joel.bernstein Joel Bernstein added a comment - Interesting. I'd like to spend some time in the next couple weeks to see what we can done in the near term on this ticket. Possibly, the first step is just to release equivalent functionality to the current Presto code, with a Calcite release. This would provide the base to gradually expand the SQL feature set.
          Hide
          julianhyde Julian Hyde added a comment -

          Hi everyone! I'm VP of Apache Calcite. I only just noticed this JIRA case. I am excited that you are considering using Calcite. Please let me know if I can help.

          Show
          julianhyde Julian Hyde added a comment - Hi everyone! I'm VP of Apache Calcite. I only just noticed this JIRA case. I am excited that you are considering using Calcite. Please let me know if I can help.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Hi Julian!

          Thanks for the offer to help out. Kevin Risden and I are very interested in using Calcite to power Solr's Parallel SQL engine so we can use Calcites awesome optimizer.

          Kevin has been doing most of the work on this but I will be helping out more following the next Solr release.

          I think I our biggest struggle has been understanding how to apply the rules properly to push-down distributed joins and aggregations. Solr supports fast MapReduce shuffling, distributed joins and also has mature faceting analytics so we'd like to take advantage of all this power from the SQL interface.

          Show
          joel.bernstein Joel Bernstein added a comment - Hi Julian! Thanks for the offer to help out. Kevin Risden and I are very interested in using Calcite to power Solr's Parallel SQL engine so we can use Calcites awesome optimizer. Kevin has been doing most of the work on this but I will be helping out more following the next Solr release. I think I our biggest struggle has been understanding how to apply the rules properly to push-down distributed joins and aggregations. Solr supports fast MapReduce shuffling, distributed joins and also has mature faceting analytics so we'd like to take advantage of all this power from the SQL interface.
          Hide
          julianhyde Julian Hyde added a comment -

          You should probably model your join and aggregate operators as sub-classes of Join and Aggregate that understand the "distribution" trait. If you are doing, say, "group by x" then you will need your input either to be singleton (i.e. only one input stream) or partitioned on x. Calcite will be able to ensure that the input is partitioned appropriately, either because it is stored in partitions, or by applying a shuffle/exchange.

          There is the regular Exchange operator that changes the distribution (i.e. re-partitions) and there is SortExchange that changes the distribution and also sorts within each partition. SortExchange models what the shuffle does in MapReduce.

          After you have a plan like

          MyJoin[left.a = right.b]
            Exchange[a]
              MyAggregate
                Exchange
                  Scan[T1]
            Exchange[b]
              Scan[T2]
          

          you can turn into map-reduce by making the consumer of each Exchange into a reduce task, and the input to each Exchange a map task.

          I asked Ashutosh Chauhan how he would generate Hive MapReduce plans in Calcite (most Hive plans these days are Tez) and he said you should consider writing a CoGroup operator (like the one in Pig). CoGroup is powerful enough to implement both join and aggregate, so it might save you some effort.

          Show
          julianhyde Julian Hyde added a comment - You should probably model your join and aggregate operators as sub-classes of Join and Aggregate that understand the "distribution" trait. If you are doing, say, "group by x" then you will need your input either to be singleton (i.e. only one input stream) or partitioned on x. Calcite will be able to ensure that the input is partitioned appropriately, either because it is stored in partitions, or by applying a shuffle/exchange. There is the regular Exchange operator that changes the distribution (i.e. re-partitions) and there is SortExchange that changes the distribution and also sorts within each partition. SortExchange models what the shuffle does in MapReduce. After you have a plan like MyJoin[left.a = right.b] Exchange[a] MyAggregate Exchange Scan[T1] Exchange[b] Scan[T2] you can turn into map-reduce by making the consumer of each Exchange into a reduce task, and the input to each Exchange a map task. I asked Ashutosh Chauhan how he would generate Hive MapReduce plans in Calcite (most Hive plans these days are Tez) and he said you should consider writing a CoGroup operator (like the one in Pig). CoGroup is powerful enough to implement both join and aggregate, so it might save you some effort.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          The CoGroup operator is interesting. Solr's Streaming Expressions provide some similar operations (https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions). CoGroup seems similar to a rollup() function wrapped around a join in Streaming Exrpressions. Psuedo-code:

           rollup(sum(x),
                  over="y",
                  innerJoin(on="a=b",
                            search(...), 
                            search(...))) 
          

          Our basic strategy is to build up a Streaming Expression that implements Calcites relational algebra. We've done this already with the Presto parser, but without joins.

          Looking forward to diving deeper into Calcite.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited The CoGroup operator is interesting. Solr's Streaming Expressions provide some similar operations ( https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions ). CoGroup seems similar to a rollup() function wrapped around a join in Streaming Exrpressions. Psuedo-code: rollup(sum(x), over= "y" , innerJoin(on= "a=b" , search(...), search(...))) Our basic strategy is to build up a Streaming Expression that implements Calcites relational algebra. We've done this already with the Presto parser, but without joins. Looking forward to diving deeper into Calcite.
          Hide
          julianhyde Julian Hyde added a comment -

          The trickiest thing about CoGroup is that it aggregates (i.e. groups together) rows without collapsing them. So you need to be able to represent a nested set of rows. If Solr's evaluator can't handle nested rows then CoGroup will be tricky.

          If you already have join and aggregate I'd stick with them.

          Show
          julianhyde Julian Hyde added a comment - The trickiest thing about CoGroup is that it aggregates (i.e. groups together) rows without collapsing them. So you need to be able to represent a nested set of rows. If Solr's evaluator can't handle nested rows then CoGroup will be tricky. If you already have join and aggregate I'd stick with them.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Ah, I see. I think we can pull off CoGroup with a merge() and reduce().

          reduce(merge(search(..., sort="a asc"),
                       search(..., sort="a asc"),
                       on="a asc")), 
                 by="a", 
                 group(sort="b asc", n="5")) 
          

          This operation is more generic in that it could be an aggregation or a join depending on what comes after the reduce.

          Fun stuff.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Ah, I see. I think we can pull off CoGroup with a merge() and reduce(). reduce(merge(search(..., sort= "a asc" ), search(..., sort= "a asc" ), on= "a asc" )), by= "a" , group(sort= "b asc" , n= "5" )) This operation is more generic in that it could be an aggregation or a join depending on what comes after the reduce. Fun stuff.
          Show
          risdenk Kevin Risden added a comment - - edited Adding some resources that may be helpful: http://www.slideshare.net/HadoopSummit/costbased-query-optimization https://medium.com/@mpathirage/query-planning-with-apache-calcite-part-1-fe957b011c36#.ywd9ouxmv http://www.slideshare.net/JordanHalterman/introduction-to-apache-calcite
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user joel-bernstein commented on the issue:

          https://github.com/apache/lucene-solr/pull/104

          Ok, got it. I'll pull the branch and start working with it.

          Show
          githubbot ASF GitHub Bot added a comment - Github user joel-bernstein commented on the issue: https://github.com/apache/lucene-solr/pull/104 Ok, got it. I'll pull the branch and start working with it.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Kevin Risden, I'll pull the branch and begin working with it. My plan is to run the tests and see how the various Calcite pieces get triggered.

          Perhaps for the 6.4 for release we should shoot for the same functionality we currently have, just with Calcite swapped in.

          If we want tp add one new thing I think the SELECT COUNT(DISTINCT) ... query would be a great thing to add.

          Show
          joel.bernstein Joel Bernstein added a comment - Kevin Risden , I'll pull the branch and begin working with it. My plan is to run the tests and see how the various Calcite pieces get triggered. Perhaps for the 6.4 for release we should shoot for the same functionality we currently have, just with Calcite swapped in. If we want tp add one new thing I think the SELECT COUNT(DISTINCT) ... query would be a great thing to add.
          Hide
          risdenk Kevin Risden added a comment -

          One thing that you may have to do to make the Solr server happy until Calcite 1.11 gets released is put the solr-core jar in front of calcite-core on the classpath. They are alphabetical right now (I fixed by renaming solr-core to asolr-core). This is to make sure the CalciteConnectionProperty fixes goes in before the original version. Otherwise you get an abstract method error.

          select count(distinct) ... might actually work right now with that patch. If it doesn't then comment out the rules in the static block in SolrRules. That should give the full power of just regular SQL on top of Solr. It wouldn't really push anything down but still the SQL works. (This should include joins)

          You should be able to add/change the SolrRules to see what gets pushed down. The classes from the branch are the exact same as the code here: https://github.com/risdenk/solr-calcite-example The code there has some more tests that show the explain plan. It was easier for me to iterate on the implementation of the rules that way.

          Show
          risdenk Kevin Risden added a comment - One thing that you may have to do to make the Solr server happy until Calcite 1.11 gets released is put the solr-core jar in front of calcite-core on the classpath. They are alphabetical right now (I fixed by renaming solr-core to asolr-core). This is to make sure the CalciteConnectionProperty fixes goes in before the original version. Otherwise you get an abstract method error. select count(distinct) ... might actually work right now with that patch. If it doesn't then comment out the rules in the static block in SolrRules. That should give the full power of just regular SQL on top of Solr. It wouldn't really push anything down but still the SQL works. (This should include joins) You should be able to add/change the SolrRules to see what gets pushed down. The classes from the branch are the exact same as the code here: https://github.com/risdenk/solr-calcite-example The code there has some more tests that show the explain plan. It was easier for me to iterate on the implementation of the rules that way.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          (This should include joins)

          Awesome! I know many who have been waiting for that!

          Show
          yseeley@gmail.com Yonik Seeley added a comment - (This should include joins) Awesome! I know many who have been waiting for that!
          Hide
          julianhyde Julian Hyde added a comment -

          Is there a Calcite issue logged for the AbstractMethodError relating to CalciteConnectionProperty? I see others are running into the same problem and I want to document the solution (or fix the bug in Calcite/Avatica if it is a bug).

          Show
          julianhyde Julian Hyde added a comment - Is there a Calcite issue logged for the AbstractMethodError relating to CalciteConnectionProperty? I see others are running into the same problem and I want to document the solution (or fix the bug in Calcite/Avatica if it is a bug).
          Hide
          julianhyde Julian Hyde added a comment -

          Ah, I think I see what's going on. You're using avatica-1.9-SNAPSHOT with calcite-1.10. calcite-1.10 requires avatica-1.8, so you should use that. (Or is there a good reason why you need avatica-1.9?)

          By the way, avatica-1.9 is less than a week from release. calcite-1.11 is maybe a month to six weeks away. The exact compatibility issues you describe are covered in CALCITE-1270 (and see the PR attached to that case).

          Show
          julianhyde Julian Hyde added a comment - Ah, I think I see what's going on. You're using avatica-1.9-SNAPSHOT with calcite-1.10. calcite-1.10 requires avatica-1.8, so you should use that. (Or is there a good reason why you need avatica-1.9?) By the way, avatica-1.9 is less than a week from release. calcite-1.11 is maybe a month to six weeks away. The exact compatibility issues you describe are covered in CALCITE-1270 (and see the PR attached to that case).
          Hide
          risdenk Kevin Risden added a comment -

          Avatica 1.9 (and Calcite 1.11) is because this needs avatica-core that doesn't do shading. The shading was causing problems integrating into Solr (forget the exact errors now). I saw that Calcite 1.10 requires Avatica 1.8 so just copied the one offending file for now. Since Avatica 1.9 was just released, I'll switch from the SNAPSHOT to the release and should be able to use 1.11 SNAPSHOT since the compatibility fixes for 1.9 were included for Calcite now.

          Show
          risdenk Kevin Risden added a comment - Avatica 1.9 (and Calcite 1.11) is because this needs avatica-core that doesn't do shading. The shading was causing problems integrating into Solr (forget the exact errors now). I saw that Calcite 1.10 requires Avatica 1.8 so just copied the one offending file for now. Since Avatica 1.9 was just released, I'll switch from the SNAPSHOT to the release and should be able to use 1.11 SNAPSHOT since the compatibility fixes for 1.9 were included for Calcite now.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user risdenk commented on the issue:

          https://github.com/apache/lucene-solr/pull/104

          @joel-bernstein if you push to the `jira/solr-8593` branch it should update this PR.

          Show
          githubbot ASF GitHub Bot added a comment - Github user risdenk commented on the issue: https://github.com/apache/lucene-solr/pull/104 @joel-bernstein if you push to the `jira/solr-8593` branch it should update this PR.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user risdenk commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/104#discussion_r85965892

          — Diff: lucene/default-nested-ivy-settings.xml —
          @@ -32,6 +32,7 @@
          <caches lockStrategy="$

          {ivy.lock-strategy}

          " resolutionCacheDir="$

          {ivy.resolution-cache.dir}

          " />

          <resolvers>
          + <ibiblio name="apache-snapshot" root="https://repository.apache.org/content/repositories/snapshots" m2compatible="true" />
          — End diff –

          Avatica 1.9 released and PR updated. Waiting on Calcite 1.11.

          Show
          githubbot ASF GitHub Bot added a comment - Github user risdenk commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/104#discussion_r85965892 — Diff: lucene/default-nested-ivy-settings.xml — @@ -32,6 +32,7 @@ <caches lockStrategy="$ {ivy.lock-strategy} " resolutionCacheDir="$ {ivy.resolution-cache.dir} " /> <resolvers> + <ibiblio name="apache-snapshot" root="https://repository.apache.org/content/repositories/snapshots" m2compatible="true" /> — End diff – Avatica 1.9 released and PR updated. Waiting on Calcite 1.11.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Very excited about this patch. After a day of review I think understand how this comes together. Next step is for me to get it running. Initial run of the test cases fail, probably due to the classpath issue that Kevin Risden mentions. I'll work on getting it running as the next step and compare the feature set with the current release.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Very excited about this patch. After a day of review I think understand how this comes together. Next step is for me to get it running. Initial run of the test cases fail, probably due to the classpath issue that Kevin Risden mentions. I'll work on getting it running as the next step and compare the feature set with the current release.
          Hide
          julianhyde Julian Hyde added a comment - - edited

          By the way, when you're ready, add please Solr to the powered by Calcite page; see CALCITE-1112 for details.

          Show
          julianhyde Julian Hyde added a comment - - edited By the way, when you're ready, add please Solr to the powered by Calcite page; see CALCITE-1112 for details.
          Hide
          caomanhdat Cao Manh Dat added a comment - - edited

          Patch based on pull/104. Fixed most of the failed tests ( some tests is modified a little bit to more accurate ).
          There are something we have to do, but it pretty close now.

          Show
          caomanhdat Cao Manh Dat added a comment - - edited Patch based on pull/104. Fixed most of the failed tests ( some tests is modified a little bit to more accurate ). There are something we have to do, but it pretty close now.
          Hide
          risdenk Kevin Risden added a comment -

          Thanks Cao Manh Dat! I'll integrate the fixes into the jira/solr-8593 branch and pull/104.

          Show
          risdenk Kevin Risden added a comment - Thanks Cao Manh Dat ! I'll integrate the fixes into the jira/solr-8593 branch and pull/104.
          Hide
          caomanhdat Cao Manh Dat added a comment - - edited

          Thanks Kevin Risden
          Julian Hyde BTW: How can we get rid of EXPR$1 return in

          select str_s, count(*) from collection1
          

          the result set return the name of second column as EXPR$1, not count( * ) as expected

          Show
          caomanhdat Cao Manh Dat added a comment - - edited Thanks Kevin Risden Julian Hyde BTW: How can we get rid of EXPR$1 return in select str_s, count(*) from collection1 the result set return the name of second column as EXPR$1 , not count( * ) as expected
          Hide
          julianhyde Julian Hyde added a comment -

          "count(*)" is not a good derived column name, because it contains non-alphanumeric characters and is therefore not a valid identifier unless you enclose it in double-quotes. Therefore Calcite generates an alias that is a valid identifier.

          I believe quite a few other databases do this.

          Show
          julianhyde Julian Hyde added a comment - "count(*)" is not a good derived column name, because it contains non-alphanumeric characters and is therefore not a valid identifier unless you enclose it in double-quotes. Therefore Calcite generates an alias that is a valid identifier. I believe quite a few other databases do this.
          Hide
          caomanhdat Cao Manh Dat added a comment -

          Thanks Julian Hyde, that make sense, but will Calcite have an option for that in the future? Because along with JDBC API, we also support query through REST API.

          For example

          curl --data-urlencode 'stmt=SELECT to, count(*) FROM collection4 GROUP BY to ORDER BY count(*) desc LIMIT 10' 
            http://localhost:8983/solr/collection4/sql?aggregationMode=facet
          

          Below is sample result set:

          {"result-set":{"docs":[
            {"count(*)":9158,"to":"pete.davis@enron.com"},
            {"count(*)":6244,"to":"tana.jones@enron.com"},
            {"count(*)":5874,"to":"jeff.dasovich@enron.com"},
            {"count(*)":5867,"to":"sara.shackleton@enron.com"},
            {"count(*)":5595,"to":"steven.kean@enron.com"},
            {"count(*)":4904,"to":"vkaminski@aol.com"},
            {"count(*)":4622,"to":"mark.taylor@enron.com"},
            {"count(*)":3819,"to":"kay.mann@enron.com"},
            {"count(*)":3678,"to":"richard.shapiro@enron.com"},
            {"count(*)":3653,"to":"kate.symes@enron.com"},
            {"EOF":"true","RESPONSE_TIME":10}]}
          }
          

          So "EXPR$1" is kinda weird as a field name.

          I also want to ask another question, right now we rely on Calcite in memory filter to filter expr in having clause ( as {{ having sum(field_i) = 19 }} in below query

          select str_s, count(*) as myCount, sum(field_i), min(field_i), max(field_i), cast(avg(1.0 * field_i) as float) from collection1 where text='XXXX' group by str_s having sum(field_i) = 19 order by sum(field_i) asc
          

          So I created a filter predicate for convert LogicalFilter to SolrFilter like this

          private static final boolean isNotFilterByExpr (List<RexNode> rexNodes, List<String> fieldNames) {
            // We dont have a way to filter by result of aggregator now
            boolean result = true;
            for (RexNode rexNode : rexNodes) {
            if (rexNode instanceof RexCall) {
            result = result && isNotFilterByExpr(((RexCall) rexNode).getOperands(), fieldNames);
            } else if (rexNode instanceof RexInputRef) {
            result = result && !fieldNames.get(((RexInputRef) rexNode).getIndex()).startsWith("EXPR$");
            }
            }
            return result;
          }
          
          private static final Predicate<RelNode> FILTER_PREDICATE = relNode -> {
            List<RexNode> filterOperands = ((RexCall) ((LogicalFilter) relNode).getCondition()).getOperands();
            return isNotFilterByExpr(filterOperands, SolrRules.solrFieldNames(relNode.getRowType()));
          };
          

          But the above code is not works when I add alias for sum(field_i) like this

          select str_s, count(*) as myCount, sum(field_i) as mySum, min(field_i), max(field_i), cast(avg(1.0 * field_i) as float) from collection1 where text='XXXX' group by str_s having sum(field_i) = 19 order by sum(field_i) asc
          

          So do Calcite have a way to check for RexInputRef is an alias or not?

          Show
          caomanhdat Cao Manh Dat added a comment - Thanks Julian Hyde , that make sense, but will Calcite have an option for that in the future? Because along with JDBC API, we also support query through REST API. For example curl --data-urlencode 'stmt=SELECT to, count(*) FROM collection4 GROUP BY to ORDER BY count(*) desc LIMIT 10' http: //localhost:8983/solr/collection4/sql?aggregationMode=facet Below is sample result set: { "result-set" :{ "docs" :[ { "count(*)" :9158, "to" : "pete.davis@enron.com" }, { "count(*)" :6244, "to" : "tana.jones@enron.com" }, { "count(*)" :5874, "to" : "jeff.dasovich@enron.com" }, { "count(*)" :5867, "to" : "sara.shackleton@enron.com" }, { "count(*)" :5595, "to" : "steven.kean@enron.com" }, { "count(*)" :4904, "to" : "vkaminski@aol.com" }, { "count(*)" :4622, "to" : "mark.taylor@enron.com" }, { "count(*)" :3819, "to" : "kay.mann@enron.com" }, { "count(*)" :3678, "to" : "richard.shapiro@enron.com" }, { "count(*)" :3653, "to" : "kate.symes@enron.com" }, { "EOF" : " true " , "RESPONSE_TIME" :10}]} } So "EXPR$1" is kinda weird as a field name. I also want to ask another question, right now we rely on Calcite in memory filter to filter expr in having clause ( as {{ having sum(field_i) = 19 }} in below query select str_s, count(*) as myCount, sum(field_i), min(field_i), max(field_i), cast (avg(1.0 * field_i) as float ) from collection1 where text='XXXX' group by str_s having sum(field_i) = 19 order by sum(field_i) asc So I created a filter predicate for convert LogicalFilter to SolrFilter like this private static final boolean isNotFilterByExpr (List<RexNode> rexNodes, List< String > fieldNames) { // We dont have a way to filter by result of aggregator now boolean result = true ; for (RexNode rexNode : rexNodes) { if (rexNode instanceof RexCall) { result = result && isNotFilterByExpr(((RexCall) rexNode).getOperands(), fieldNames); } else if (rexNode instanceof RexInputRef) { result = result && !fieldNames.get(((RexInputRef) rexNode).getIndex()).startsWith( "EXPR$" ); } } return result; } private static final Predicate<RelNode> FILTER_PREDICATE = relNode -> { List<RexNode> filterOperands = ((RexCall) ((LogicalFilter) relNode).getCondition()).getOperands(); return isNotFilterByExpr(filterOperands, SolrRules.solrFieldNames(relNode.getRowType())); }; But the above code is not works when I add alias for sum(field_i) like this select str_s, count(*) as myCount, sum(field_i) as mySum, min(field_i), max(field_i), cast (avg(1.0 * field_i) as float ) from collection1 where text='XXXX' group by str_s having sum(field_i) = 19 order by sum(field_i) asc So do Calcite have a way to check for RexInputRef is an alias or not?
          Hide
          risdenk Kevin Risden added a comment -

          Cao Manh Dat - I merged your changes with the jira/solr-8593 branch. The pr 104 was updated automatically.

          In the future, I think you can open a PR against the existing jira/solr-8593 branch. This would probably be easier than a patch to merge changes.

          Show
          risdenk Kevin Risden added a comment - Cao Manh Dat - I merged your changes with the jira/solr-8593 branch. The pr 104 was updated automatically. In the future, I think you can open a PR against the existing jira/solr-8593 branch. This would probably be easier than a patch to merge changes.
          Hide
          caomanhdat Cao Manh Dat added a comment -

          Thanks Kevin Risden,

          Yeah, I will definitely do that in the future.

          Show
          caomanhdat Cao Manh Dat added a comment - Thanks Kevin Risden , Yeah, I will definitely do that in the future.
          Hide
          julianhyde Julian Hyde added a comment - - edited

          Regarding the alias for "count(*)". I guess one approach is to extend Calcite to allow a pluggable alias derivation (it has to be pluggable because you can't please everyone). Another approach is to leave the aliases as they are but generate field names for the JSON result set. Note that if you call SqlNode.getParserPosition() on each item in the select clause it will tell you the start and end point of that expression in the original SQL string, so you can extract the "count(*)" using that information.

          I don't think the the following should be valid, but under your proposed change it would be:

          SELECT deptno
          FROM (
            SELECT deptno, count(*)
            FROM emp
            GROUP BY deptno) AS t
          WHERE t."count(*)" > 3
          

          Note that "count(*)" is not an expression; it is a reference to a "column" produced by the sub-query. In my opinion, using a textual expression is very confusing, and we should not do it. Derived alias of count(*) should be something not easily guessable, which will encourage users to use an alias:

          SELECT deptno
          FROM (
            SELECT deptno, count(*) AS c
            FROM emp
            GROUP BY deptno) AS t
          WHERE t.c > 3
          
          Show
          julianhyde Julian Hyde added a comment - - edited Regarding the alias for "count(*)". I guess one approach is to extend Calcite to allow a pluggable alias derivation (it has to be pluggable because you can't please everyone). Another approach is to leave the aliases as they are but generate field names for the JSON result set. Note that if you call SqlNode.getParserPosition() on each item in the select clause it will tell you the start and end point of that expression in the original SQL string, so you can extract the "count(*)" using that information. I don't think the the following should be valid, but under your proposed change it would be: SELECT deptno FROM ( SELECT deptno, count(*) FROM emp GROUP BY deptno) AS t WHERE t. "count(*)" > 3 Note that "count(*)" is not an expression; it is a reference to a "column" produced by the sub-query. In my opinion, using a textual expression is very confusing, and we should not do it. Derived alias of count(*) should be something not easily guessable, which will encourage users to use an alias: SELECT deptno FROM ( SELECT deptno, count(*) AS c FROM emp GROUP BY deptno) AS t WHERE t.c > 3
          Hide
          julianhyde Julian Hyde added a comment -

          You're making a mistake I see a lot of people making: trying to do complex semantic transformations on the AST (SqlNode). That's an anti-pattern, because SQL's complex rules for name-resolution make the AST very brittle. You should do those kinds of transformations on the relational algebra tree (RelNode).

          In fact, Calcite will convert query into a Scan -> Filter -> Aggregate -> Filter -> Project logical plan (the first Filter is the WHERE clause, the second Filter is the HAVING clause), so I don't think you need to do any tricky processing looking for aliases.

          Show
          julianhyde Julian Hyde added a comment - You're making a mistake I see a lot of people making: trying to do complex semantic transformations on the AST (SqlNode). That's an anti-pattern, because SQL's complex rules for name-resolution make the AST very brittle. You should do those kinds of transformations on the relational algebra tree (RelNode). In fact, Calcite will convert query into a Scan -> Filter -> Aggregate -> Filter -> Project logical plan (the first Filter is the WHERE clause, the second Filter is the HAVING clause), so I don't think you need to do any tricky processing looking for aliases.
          Hide
          caomanhdat Cao Manh Dat added a comment - - edited

          But we wanna to handle having clause without the function like ( Because Solr will run this filter faster than Calcite )

          having field_i = 19
          

          and left the other cases for Calcite to handle.

          Is there any better ways to do this kind of filter?

          Show
          caomanhdat Cao Manh Dat added a comment - - edited But we wanna to handle having clause without the function like ( Because Solr will run this filter faster than Calcite ) having field_i = 19 and left the other cases for Calcite to handle. Is there any better ways to do this kind of filter?
          Hide
          julianhyde Julian Hyde added a comment -

          CALCITE-1306 covers this. It's not standard SQL but could be enabled via an extension.

          I disagree that "Solr will run this filter faster than Calcite". With query optimization, both queries will produce identical plans. This issue is not about performance. It is about syntactic sugar (not that there's anything wrong with that).

          Show
          julianhyde Julian Hyde added a comment - CALCITE-1306 covers this. It's not standard SQL but could be enabled via an extension. I disagree that "Solr will run this filter faster than Calcite". With query optimization, both queries will produce identical plans. This issue is not about performance. It is about syntactic sugar (not that there's anything wrong with that).
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Hi Kevin Risden and Cao Manh Dat. I've reviewed the latest work on this ticket and it's looking really good!

          A couple pieces of functionality that appear to be missing:

          1) Specific handling of SELECT DISTINCT queries. In the current SQLHandler we can do MapReduce SELECT DISTINCT queries in parallel on worker nodes. And we can also push down the distinct logic to the JSON Facet API.

          2) The pushing down of GROUP BY aggregations to the JSON Facet API.

          Both of these currently require the aggregationMode parameter to be passed in with the query, which I think is fine for the initial Calcite release.

          I'd be happy to add these capabilities to this branch. That will also give me an opportunity to work with the code and feel comfortable working with Calcite.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Hi Kevin Risden and Cao Manh Dat . I've reviewed the latest work on this ticket and it's looking really good! A couple pieces of functionality that appear to be missing: 1) Specific handling of SELECT DISTINCT queries. In the current SQLHandler we can do MapReduce SELECT DISTINCT queries in parallel on worker nodes. And we can also push down the distinct logic to the JSON Facet API. 2) The pushing down of GROUP BY aggregations to the JSON Facet API. Both of these currently require the aggregationMode parameter to be passed in with the query, which I think is fine for the initial Calcite release. I'd be happy to add these capabilities to this branch. That will also give me an opportunity to work with the code and feel comfortable working with Calcite.
          Hide
          caomanhdat Cao Manh Dat added a comment -

          What I mean about "run this filter faster than Calcite" is, Solr have something called parallel streaming expressions which will distribute that filter on many nodes, but as far as I know Calcite will do the second Filter ( HAVING clause ) in single node.

          Show
          caomanhdat Cao Manh Dat added a comment - What I mean about "run this filter faster than Calcite" is, Solr have something called parallel streaming expressions which will distribute that filter on many nodes, but as far as I know Calcite will do the second Filter ( HAVING clause ) in single node.
          Hide
          julianhyde Julian Hyde added a comment -

          Calcite rewrites SELECT DISTINCT ... to SELECT ... GROUP BY .... So if you just deal with GROUP BY (i.e. Calcite's Aggregate operator) you should be fine.

          Show
          julianhyde Julian Hyde added a comment - Calcite rewrites SELECT DISTINCT ... to SELECT ... GROUP BY ... . So if you just deal with GROUP BY (i.e. Calcite's Aggregate operator) you should be fine.
          Hide
          julianhyde Julian Hyde added a comment -

          Calcite is an algebra, not an executor. When if converts a HAVING clause to a SolrFilter you are more than welcome to run those filters in parallel. I suppose it would mean SolrAggregate producing parallel output streams.

          Show
          julianhyde Julian Hyde added a comment - Calcite is an algebra, not an executor. When if converts a HAVING clause to a SolrFilter you are more than welcome to run those filters in parallel. I suppose it would mean SolrAggregate producing parallel output streams.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Ok thanks, that should work nicely for us.

          Show
          joel.bernstein Joel Bernstein added a comment - Ok thanks, that should work nicely for us.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          I've started to work on this ticket. As a first step I'm doing some refactoring on the SolrTable class to create methods for handling the different types of queries. After that I'll get the aggregationModes hooked up.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited I've started to work on this ticket. As a first step I'm doing some refactoring on the SolrTable class to create methods for handling the different types of queries. After that I'll get the aggregationModes hooked up.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          One of things that is also not specifically handled is the HAVING clause.

          I think we should push down this capability to Solr as well so we can perform the HAVING logic on the worker nodes. In high cardinality use cases this will be a big performance improvement.

          We also need to develop a HavingStream to manage the having logic. I'll start the work for the HavingStream in this branch as it directly supports the Calcite integration.

          My plan is to implement the following classes for the having logic:

          HavingStream
          BooleanOperation
          AndOperation
          OrOperation
          NotOperation
          EqualsOperation
          LessThenOperation
          GreaterThenOperation

          Syntax:

          having(streamExpr, and(eq(field1, value1), not(eq(field2, value2))))

          The having function will read the Tuples from the streamExpr and apply the boolean operation to each Tuple.

          If the boolean operation returns true the having stream will emit the Tuple.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited One of things that is also not specifically handled is the HAVING clause. I think we should push down this capability to Solr as well so we can perform the HAVING logic on the worker nodes. In high cardinality use cases this will be a big performance improvement. We also need to develop a HavingStream to manage the having logic. I'll start the work for the HavingStream in this branch as it directly supports the Calcite integration. My plan is to implement the following classes for the having logic: HavingStream BooleanOperation AndOperation OrOperation NotOperation EqualsOperation LessThenOperation GreaterThenOperation Syntax: having (streamExpr, and ( eq (field1, value1), not ( eq (field2, value2)))) The having function will read the Tuples from the streamExpr and apply the boolean operation to each Tuple. If the boolean operation returns true the having stream will emit the Tuple.
          Hide
          risdenk Kevin Risden added a comment -

          Joel Bernstein - Not sure I understand the need for a having stream...

          To quote Julian Hyde:

          In fact, Calcite will convert query into a Scan -> Filter -> Aggregate -> Filter -> Project logical plan (the first Filter is the WHERE clause, the second Filter is the HAVING clause),

          Since a having clause is really just a filter on an aggregate, I'm not sure what we could really gain from pushing it down much further. The Avatica/Calcite JDBC implementation supports the having clause if we don't optimize for it.

          Show
          risdenk Kevin Risden added a comment - Joel Bernstein - Not sure I understand the need for a having stream... To quote Julian Hyde : In fact, Calcite will convert query into a Scan -> Filter -> Aggregate -> Filter -> Project logical plan (the first Filter is the WHERE clause, the second Filter is the HAVING clause), Since a having clause is really just a filter on an aggregate, I'm not sure what we could really gain from pushing it down much further. The Avatica/Calcite JDBC implementation supports the having clause if we don't optimize for it.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          New patch with beginnings of refactoring of the SolrTable and the initial implementation of the HavingStream and accompanying BooleanOperations.

          This patch is just meant for review. I'll push changes out to the branch as it matures.

          Show
          joel.bernstein Joel Bernstein added a comment - New patch with beginnings of refactoring of the SolrTable and the initial implementation of the HavingStream and accompanying BooleanOperations. This patch is just meant for review. I'll push changes out to the branch as it matures.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          There would only be an advantage when grouping on high cardinality fields. For example a multi-dimension aggregate that produces millions of distinct aggregations. In this scenario we can push the having expression to worker nodes so all the aggregation tuples don't need to be sent back to the SQLHandler. If the Having expression eliminates a significant number of tuples we can eliminate a lot of network traffic and a bottleneck at the SQLHandler.

          Show
          joel.bernstein Joel Bernstein added a comment - There would only be an advantage when grouping on high cardinality fields. For example a multi-dimension aggregate that produces millions of distinct aggregations. In this scenario we can push the having expression to worker nodes so all the aggregation tuples don't need to be sent back to the SQLHandler. If the Having expression eliminates a significant number of tuples we can eliminate a lot of network traffic and a bottleneck at the SQLHandler.
          Hide
          julianhyde Julian Hyde added a comment -

          Calcite's operators are logical. A 'Filter' operator might turn into operator instances running on multiple nodes or threads, each processing a partition of the data.

          Show
          julianhyde Julian Hyde added a comment - Calcite's operators are logical. A 'Filter' operator might turn into operator instances running on multiple nodes or threads, each processing a partition of the data.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          I wanted to give an update on my work on this ticket.

          I've started working my way through the test cases (TestSQLHandler). I'm working through each assertion in each method to understand the differences between the current release and the work done in this patch, and making changes/fixes as I go.

          The first change that I made was in how the predicate is being traversed. The current patch doesn't descend through a full nested AND/OR predicate. So I made a few changes to how the tree is walked. I also changed some of the logic for how the query is re-written to a Lucene/Solr query so that it matches the current implementation.

          I've now moved on to aggregate queries. I've been investigating the use of EXPR$1 ... instead of using the function signature in the result set. It looks like we'll have to use the Caclite expression identifiers going forward, which should be OK. I think this is cleaner anyway because looking up fields by a function signature can get cumbersome. We'll just need to document this in the CHANGES.txt.

          The next step for me is to implement the aggregationMode=facet logic for aggregate queries. After that I'll push out my changes to this branch.

          Then I'll spend some time investigating how SELECT DISTINCT behaves with our Calcite implementation. As Julian Hyde mentioned, we should see DISTINCT queries as aggregate queries so it's possible we'll have all the code in place to push this to Solr already.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited I wanted to give an update on my work on this ticket. I've started working my way through the test cases (TestSQLHandler). I'm working through each assertion in each method to understand the differences between the current release and the work done in this patch, and making changes/fixes as I go. The first change that I made was in how the predicate is being traversed. The current patch doesn't descend through a full nested AND/OR predicate. So I made a few changes to how the tree is walked. I also changed some of the logic for how the query is re-written to a Lucene/Solr query so that it matches the current implementation. I've now moved on to aggregate queries. I've been investigating the use of EXPR$1 ... instead of using the function signature in the result set. It looks like we'll have to use the Caclite expression identifiers going forward, which should be OK. I think this is cleaner anyway because looking up fields by a function signature can get cumbersome. We'll just need to document this in the CHANGES.txt. The next step for me is to implement the aggregationMode=facet logic for aggregate queries. After that I'll push out my changes to this branch. Then I'll spend some time investigating how SELECT DISTINCT behaves with our Calcite implementation. As Julian Hyde mentioned, we should see DISTINCT queries as aggregate queries so it's possible we'll have all the code in place to push this to Solr already.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user risdenk commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/104#discussion_r92703928

          — Diff: solr/solrj/src/java/org/apache/solr/client/solrj/io/ops/GreaterThanOperation.java —
          @@ -0,0 +1,70 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.solr.client.solrj.io.ops;
          +
          +import java.io.IOException;
          +import java.util.UUID;
          +
          +import org.apache.solr.client.solrj.io.Tuple;
          +import org.apache.solr.client.solrj.io.stream.expr.Explanation;
          +import org.apache.solr.client.solrj.io.stream.expr.Explanation.ExpressionType;
          +import org.apache.solr.client.solrj.io.stream.expr.StreamExpression;
          +import org.apache.solr.client.solrj.io.stream.expr.StreamFactory;
          +
          +public class GreaterThanOperation extends LeafOperation {
          +
          + private static final long serialVersionUID = 1;
          + private UUID operationNodeId = UUID.randomUUID();
          +
          + public void operate(Tuple tuple)

          { + this.tuple = tuple; + }

          +
          + public GreaterThanOperation(String field, double val) {
          — End diff –

          @joel-bernstein - Does this mean that string/int/long comparisons won't work? I noticed that this assumes the fields are doubles.

          Show
          githubbot ASF GitHub Bot added a comment - Github user risdenk commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/104#discussion_r92703928 — Diff: solr/solrj/src/java/org/apache/solr/client/solrj/io/ops/GreaterThanOperation.java — @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.client.solrj.io.ops; + +import java.io.IOException; +import java.util.UUID; + +import org.apache.solr.client.solrj.io.Tuple; +import org.apache.solr.client.solrj.io.stream.expr.Explanation; +import org.apache.solr.client.solrj.io.stream.expr.Explanation.ExpressionType; +import org.apache.solr.client.solrj.io.stream.expr.StreamExpression; +import org.apache.solr.client.solrj.io.stream.expr.StreamFactory; + +public class GreaterThanOperation extends LeafOperation { + + private static final long serialVersionUID = 1; + private UUID operationNodeId = UUID.randomUUID(); + + public void operate(Tuple tuple) { + this.tuple = tuple; + } + + public GreaterThanOperation(String field, double val) { — End diff – @joel-bernstein - Does this mean that string/int/long comparisons won't work? I noticed that this assumes the fields are doubles.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          I just pushed out a commit to the solr/jira-8593 branch:

          https://github.com/apache/lucene-solr/commit/37fdc37fc3d88054634482d39b5774893751f91f

          This is a pretty large refactoring of the SolrTable class which includes implementations for aggregationMode=facet for both GROUP BY aggregations and SELECT DISTINCT aggregations.

          All tests in TestSQLHandler are passing.

          There is only one thing that I'm not quite happy about in this patch which is specific to Calcite. I am wondering if Julian Hyde has any thoughts on the issue. The specific issue deals with how the set of GROUP BY fields are handled in Calcite. From what I can see there isn't an easy way to get the ordering of the GROUP BY fields preserved from the query. The Solr faceting implementations requires the correct order of the GROUP BY fields to return a correct response. So, I'm getting the ordering from the field list of the query instead. This may actually be the correct approach from a SQL standpoint but I was wondering what Julian thought about this issue.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited I just pushed out a commit to the solr/jira-8593 branch: https://github.com/apache/lucene-solr/commit/37fdc37fc3d88054634482d39b5774893751f91f This is a pretty large refactoring of the SolrTable class which includes implementations for aggregationMode=facet for both GROUP BY aggregations and SELECT DISTINCT aggregations. All tests in TestSQLHandler are passing. There is only one thing that I'm not quite happy about in this patch which is specific to Calcite. I am wondering if Julian Hyde has any thoughts on the issue. The specific issue deals with how the set of GROUP BY fields are handled in Calcite. From what I can see there isn't an easy way to get the ordering of the GROUP BY fields preserved from the query. The Solr faceting implementations requires the correct order of the GROUP BY fields to return a correct response. So, I'm getting the ordering from the field list of the query instead. This may actually be the correct approach from a SQL standpoint but I was wondering what Julian thought about this issue.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user joel-bernstein commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/104#discussion_r92718063

          — Diff: solr/solrj/src/java/org/apache/solr/client/solrj/io/ops/GreaterThanOperation.java —
          @@ -0,0 +1,70 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.solr.client.solrj.io.ops;
          +
          +import java.io.IOException;
          +import java.util.UUID;
          +
          +import org.apache.solr.client.solrj.io.Tuple;
          +import org.apache.solr.client.solrj.io.stream.expr.Explanation;
          +import org.apache.solr.client.solrj.io.stream.expr.Explanation.ExpressionType;
          +import org.apache.solr.client.solrj.io.stream.expr.StreamExpression;
          +import org.apache.solr.client.solrj.io.stream.expr.StreamFactory;
          +
          +public class GreaterThanOperation extends LeafOperation {
          +
          + private static final long serialVersionUID = 1;
          + private UUID operationNodeId = UUID.randomUUID();
          +
          + public void operate(Tuple tuple)

          { + this.tuple = tuple; + }

          +
          + public GreaterThanOperation(String field, double val) {
          — End diff –

          This is the implementation Tuple.getDouble():

          public Double getDouble(Object key) {
          Object o = this.fields.get(key);

          if(o == null)

          { return null; }

          if(o instanceof Double)

          { return (Double)o; }

          else

          { //Attempt to parse the double return Double.parseDouble(o.toString()); }

          }

          So, any number will be translated to a double for comparison. We can implement a String comparison in a different operation I think.

          Show
          githubbot ASF GitHub Bot added a comment - Github user joel-bernstein commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/104#discussion_r92718063 — Diff: solr/solrj/src/java/org/apache/solr/client/solrj/io/ops/GreaterThanOperation.java — @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.client.solrj.io.ops; + +import java.io.IOException; +import java.util.UUID; + +import org.apache.solr.client.solrj.io.Tuple; +import org.apache.solr.client.solrj.io.stream.expr.Explanation; +import org.apache.solr.client.solrj.io.stream.expr.Explanation.ExpressionType; +import org.apache.solr.client.solrj.io.stream.expr.StreamExpression; +import org.apache.solr.client.solrj.io.stream.expr.StreamFactory; + +public class GreaterThanOperation extends LeafOperation { + + private static final long serialVersionUID = 1; + private UUID operationNodeId = UUID.randomUUID(); + + public void operate(Tuple tuple) { + this.tuple = tuple; + } + + public GreaterThanOperation(String field, double val) { — End diff – This is the implementation Tuple.getDouble(): public Double getDouble(Object key) { Object o = this.fields.get(key); if(o == null) { return null; } if(o instanceof Double) { return (Double)o; } else { //Attempt to parse the double return Double.parseDouble(o.toString()); } } So, any number will be translated to a double for comparison. We can implement a String comparison in a different operation I think.
          Hide
          julianhyde Julian Hyde added a comment - - edited

          I wasn't familiar with faceting, but I quickly read https://wiki.apache.org/solr/SolrFacetingOverview.

          Suppose table T has fields a, b, c, d, and you want to do a faceted search on b, a. If you issue the query select b, a, count(*) from t group by b, a then you will end up with

          Project($1, $0, $2)
            Aggregate({0, 1}, COUNT(*))
              Scan(table=T)
          

          and as you correctly say, 0, 1 represents a, b because that is the physical order of the columns.

          Can you explain why the faceting algorithm is interested in the order of the columns? Is it because it needs to produce the output ordered or nested on those columns? If so, we can rephrase the SQL query so that we are accurately expressing in relational algebra what we need.

          Show
          julianhyde Julian Hyde added a comment - - edited I wasn't familiar with faceting, but I quickly read https://wiki.apache.org/solr/SolrFacetingOverview . Suppose table T has fields a, b, c, d, and you want to do a faceted search on b, a. If you issue the query select b, a, count(*) from t group by b, a then you will end up with Project($1, $0, $2) Aggregate({0, 1}, COUNT(*)) Scan(table=T) and as you correctly say, 0, 1 represents a, b because that is the physical order of the columns. Can you explain why the faceting algorithm is interested in the order of the columns? Is it because it needs to produce the output ordered or nested on those columns? If so, we can rephrase the SQL query so that we are accurately expressing in relational algebra what we need.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          If you we have two grouping fields A, B nested facets will be gathered using the following approach:

          1) Gather the top N facets for field A.
          2) For each of the top N facets of field A, find the top N sub-facets for field B

          This avoids the exhaustive processing of all the unique combinations of A, B.

          This is very performant (sub-second) when N is a relatively small number and the cardinality of A, B is not too high.

          In high cardinality scenarios we can switch to MapReduce mode which sorts the Tuples on the GROUP BY fields and shuffles them to worker nodes. In MapReduce mode the order of the GROUP BY fields is not important.

          Having the ability to use faceting or MapReduce depending on cardinality is one of the key features of Solr's SQL implementation.

          Show
          joel.bernstein Joel Bernstein added a comment - If you we have two grouping fields A, B nested facets will be gathered using the following approach: 1) Gather the top N facets for field A. 2) For each of the top N facets of field A, find the top N sub-facets for field B This avoids the exhaustive processing of all the unique combinations of A, B. This is very performant (sub-second) when N is a relatively small number and the cardinality of A, B is not too high. In high cardinality scenarios we can switch to MapReduce mode which sorts the Tuples on the GROUP BY fields and shuffles them to worker nodes. In MapReduce mode the order of the GROUP BY fields is not important. Having the ability to use faceting or MapReduce depending on cardinality is one of the key features of Solr's SQL implementation.
          Hide
          julianhyde Julian Hyde added a comment - - edited

          Would it be correct to say that you have a physical operator which is a combination of Aggregate and TopN? This physical operator would have a sorted list of grouping fields and also a parameter N (which affects the cost estimate). Maybe it's a sub-class of Aggregate with some extra fields. It could be created by a planner rule that matches a Sort (with limit) on top of an Aggregate and also looks at estimated cardinality of the fields in order to sort them.

          Show
          julianhyde Julian Hyde added a comment - - edited Would it be correct to say that you have a physical operator which is a combination of Aggregate and TopN? This physical operator would have a sorted list of grouping fields and also a parameter N (which affects the cost estimate). Maybe it's a sub-class of Aggregate with some extra fields. It could be created by a planner rule that matches a Sort (with limit) on top of an Aggregate and also looks at estimated cardinality of the fields in order to sort them.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          The criteria for switching between facet and MapReduce would be cardinality. So a planner rule that is based on the SQL structure won't work in this scenario.

          I'm thinking the easiest approach might be to add a List of GROUP BY fields to the Aggregate class. Or possibly to add ordering information to the GroupSet BitSet.

          Show
          joel.bernstein Joel Bernstein added a comment - The criteria for switching between facet and MapReduce would be cardinality. So a planner rule that is based on the SQL structure won't work in this scenario. I'm thinking the easiest approach might be to add a List of GROUP BY fields to the Aggregate class. Or possibly to add ordering information to the GroupSet BitSet.
          Hide
          julianhyde Julian Hyde added a comment -

          A list of GROUP BY fields would be fine. But it must be in a sub-class Aggregate. Everyone else who is using Aggregate wants "Aggregate([x, y])" to be identical to "Aggregate([y, x])".

          Show
          julianhyde Julian Hyde added a comment - A list of GROUP BY fields would be fine. But it must be in a sub-class Aggregate. Everyone else who is using Aggregate wants "Aggregate( [x, y] )" to be identical to "Aggregate( [y, x] )".
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Ok that sounds good. In the meantime we'll move forward using the field list to order the GROUP BY fields. This will be a quirk of Solr's implementation that we'll document. Since Solr has a specific high performance implementation that does not fall inline with a general SQL convention, I think Solr users will be fine with this quirk.

          Show
          joel.bernstein Joel Bernstein added a comment - Ok that sounds good. In the meantime we'll move forward using the field list to order the GROUP BY fields. This will be a quirk of Solr's implementation that we'll document. Since Solr has a specific high performance implementation that does not fall inline with a general SQL convention, I think Solr users will be fine with this quirk.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Kevin Risden, Cao Manh Dat, do you guys see any issue with getting this patch into master? I can start the process of merging and getting all the tests and precommit passing. I'd like to have this in master sooner then later so we have a chance to look it a little while before back-porting for the 6.4 release.

          Kevin Risden do we have any blockers around jar conflicts?

          Show
          joel.bernstein Joel Bernstein added a comment - Kevin Risden , Cao Manh Dat , do you guys see any issue with getting this patch into master? I can start the process of merging and getting all the tests and precommit passing. I'd like to have this in master sooner then later so we have a chance to look it a little while before back-porting for the 6.4 release. Kevin Risden do we have any blockers around jar conflicts?
          Hide
          risdenk Kevin Risden added a comment -

          I think we are going to need Calcite 0.11.0 which was talked about being released on the calcite-dev list. Also the other dependency I'm not sure about is protobuf-java. See my comment here: https://github.com/apache/lucene-solr/pull/104/files#r85476697

          The apache snapshot stuff would have to be pulled out from lucene/default-nested-ivy-settings.xml.

          Show
          risdenk Kevin Risden added a comment - I think we are going to need Calcite 0.11.0 which was talked about being released on the calcite-dev list. Also the other dependency I'm not sure about is protobuf-java. See my comment here: https://github.com/apache/lucene-solr/pull/104/files#r85476697 The apache snapshot stuff would have to be pulled out from lucene/default-nested-ivy-settings.xml.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Are we blocked until Calcite 0.11.0 releases or is their president for using a snapshot release?

          With the protobuf upgrade I think as long all tests are passing it should be ok to upgrade.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Are we blocked until Calcite 0.11.0 releases or is their president for using a snapshot release? With the protobuf upgrade I think as long all tests are passing it should be ok to upgrade.
          Hide
          risdenk Kevin Risden added a comment -

          I haven't seen other places where SNAPSHOT dependencies were used I think we would be blocked until Calcite 0.11.0 since we need the shaded Avatica 1.9.0.

          Show
          risdenk Kevin Risden added a comment - I haven't seen other places where SNAPSHOT dependencies were used I think we would be blocked until Calcite 0.11.0 since we need the shaded Avatica 1.9.0.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          It wasn't clear to me that we were using Avatica yet? It seemed we were using org.apache.calcite imports. Are there dependencies within Caclite core on Avatica?

          Show
          joel.bernstein Joel Bernstein added a comment - - edited It wasn't clear to me that we were using Avatica yet? It seemed we were using org.apache.calcite imports. Are there dependencies within Caclite core on Avatica?
          Hide
          risdenk Kevin Risden added a comment -

          The Calcite JDBC piece is Avatica under the hood. Calcite core depends on Avatica.

          Show
          risdenk Kevin Risden added a comment - The Calcite JDBC piece is Avatica under the hood. Calcite core depends on Avatica.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          ok got it.

          Show
          joel.bernstein Joel Bernstein added a comment - ok got it.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          I'm going to amend my commit and remove the HavingStream and move that to a separate ticket. Since this does not rely on calcite at all it can be moved into master and 6x without delay.

          Show
          joel.bernstein Joel Bernstein added a comment - I'm going to amend my commit and remove the HavingStream and move that to a separate ticket. Since this does not rely on calcite at all it can be moved into master and 6x without delay.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Actually I don't think amending the commit is an option as the HavingStream was not added in the latest commit.

          I think I'll just put the classes in master. Then deal with the merge conflicts when merging this branch.

          Show
          joel.bernstein Joel Bernstein added a comment - Actually I don't think amending the commit is an option as the HavingStream was not added in the latest commit. I think I'll just put the classes in master. Then deal with the merge conflicts when merging this branch.
          Hide
          caomanhdat Cao Manh Dat added a comment -

          Kevin Risden Hi Kevin, I'm curious why we need Avatica 1.9.0. Because I found Avatica 1.8.0 work perfectly with my patch?

          Show
          caomanhdat Cao Manh Dat added a comment - Kevin Risden Hi Kevin, I'm curious why we need Avatica 1.9.0. Because I found Avatica 1.8.0 work perfectly with my patch?
          Hide
          risdenk Kevin Risden added a comment -

          Cao Manh Dat - Did you try building Solr and running it? I hit issues with classpath. I never hit issues when only running the tests. It has to do with how Avatica required different versions of dependencies than Solr. Avatica wasn't shaded but included every dependency. Avatica 1.9.0 doesn't include all the dependencies by default.

          Show
          risdenk Kevin Risden added a comment - Cao Manh Dat - Did you try building Solr and running it? I hit issues with classpath. I never hit issues when only running the tests. It has to do with how Avatica required different versions of dependencies than Solr. Avatica wasn't shaded but included every dependency. Avatica 1.9.0 doesn't include all the dependencies by default.
          Hide
          caomanhdat Cao Manh Dat added a comment -

          Ah, That make sense, I've just run the SQLTest through IDE only.

          Show
          caomanhdat Cao Manh Dat added a comment - Ah, That make sense, I've just run the SQLTest through IDE only.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          I saw on the Calcite dev list that the next release is coming pretty soon. So, I'm not going to move ahead creating another ticket for the HavingStream.

          I'm planning to create a new branch from master locally and merge this branch into to it so I can test with the latest master code. Master has some significant changes both in Lucene and Solr that effects SQL. Specifically the /export handler has been refactored and there is new docValues iterator API to performance test. This will likely take up the rest of the year me. But hopefully after the new Calcite releases we will be ready to merge to master.

          Show
          joel.bernstein Joel Bernstein added a comment - I saw on the Calcite dev list that the next release is coming pretty soon. So, I'm not going to move ahead creating another ticket for the HavingStream. I'm planning to create a new branch from master locally and merge this branch into to it so I can test with the latest master code. Master has some significant changes both in Lucene and Solr that effects SQL. Specifically the /export handler has been refactored and there is new docValues iterator API to performance test. This will likely take up the rest of the year me. But hopefully after the new Calcite releases we will be ready to merge to master.
          Hide
          risdenk Kevin Risden added a comment -

          Joel Bernstein - the jira/solr-8593 branch is pretty up to date. I have merged master into it a few times.

          Show
          risdenk Kevin Risden added a comment - Joel Bernstein - the jira/solr-8593 branch is pretty up to date. I have merged master into it a few times.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Ok, that's great. Then I'll just continue working with solr-8593.

          Show
          joel.bernstein Joel Bernstein added a comment - Ok, that's great. Then I'll just continue working with solr-8593.
          Hide
          julianhyde Julian Hyde added a comment -

          Yes, early January. I've logged CALCITE-1547 to track the release.

          Show
          julianhyde Julian Hyde added a comment - Yes, early January. I've logged CALCITE-1547 to track the release.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Kevin Risden, I see you've added some special logic to get the cast function working. It's not clear to me though what the code is actually doing. Can you explain what's happening in the SolrRules.RexToSolrTranslator class with the cast function?

          If possible I'd like to hook up the other string and arithmetic functions.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Kevin Risden , I see you've added some special logic to get the cast function working. It's not clear to me though what the code is actually doing. Can you explain what's happening in the SolrRules.RexToSolrTranslator class with the cast function? If possible I'd like to hook up the other string and arithmetic functions.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          I think I have a handle now on how the string and arithmetic functions work. I was expecting them to work automatically and Calcite would perform the functions if we returned data in the fields.

          I now believe that is not the case, because we've pushed down the projection. Because we've pushed down the projection we'll need to implement the arithmetic and string functions using the SelectStream. What Calcite provides in the project rule is access to the parse tree so we have enough information to implement the functions.

          Since this ticket was mainly about getting parity with the current SQL functionality, I think it makes sense to tackle the string and arithmetic functions in a separate ticket. I will create that ticket.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited I think I have a handle now on how the string and arithmetic functions work. I was expecting them to work automatically and Calcite would perform the functions if we returned data in the fields. I now believe that is not the case, because we've pushed down the projection . Because we've pushed down the projection we'll need to implement the arithmetic and string functions using the SelectStream . What Calcite provides in the project rule is access to the parse tree so we have enough information to implement the functions. Since this ticket was mainly about getting parity with the current SQL functionality, I think it makes sense to tackle the string and arithmetic functions in a separate ticket. I will create that ticket.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          It doesn't look like we're going to get this in for Solr 6.4. So, I'm currently planning to take the HavingStream from this branch and commit it with SOLR-8530 for Solr 6.4

          This is going to cause merge conflicts between jira/solr-8593 and master. As soon as the Calcite release is cut I'll begin the work to merge jira/solr-8593 to master and work out the conflicts. So Calcite will be in master at the very beginning of the dev cycle for Solr 6.5. Then we can work with the Calcite code in master and perform the backport to branch_6x when ready.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited It doesn't look like we're going to get this in for Solr 6.4. So, I'm currently planning to take the HavingStream from this branch and commit it with SOLR-8530 for Solr 6.4 This is going to cause merge conflicts between jira/solr-8593 and master. As soon as the Calcite release is cut I'll begin the work to merge jira/solr-8593 to master and work out the conflicts. So Calcite will be in master at the very beginning of the dev cycle for Solr 6.5. Then we can work with the Calcite code in master and perform the backport to branch_6x when ready.
          Hide
          risdenk Kevin Risden added a comment -

          Updated jira/solr-8593 branch with Calcite 0.11.0 since that was just released.

          Show
          risdenk Kevin Risden added a comment - Updated jira/solr-8593 branch with Calcite 0.11.0 since that was just released.
          Hide
          risdenk Kevin Risden added a comment -

          Also merged master into jira/solr-8593 and fixed the merge conflicts. They weren't bad just some minor changes.

          Show
          risdenk Kevin Risden added a comment - Also merged master into jira/solr-8593 and fixed the merge conflicts. They weren't bad just some minor changes.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Great!

          If we had another week we could have gotten this into 6.4. But we'll have the entire 6.5 dev cycle to make sure it's ready to go.

          Show
          joel.bernstein Joel Bernstein added a comment - Great! If we had another week we could have gotten this into 6.4. But we'll have the entire 6.5 dev cycle to make sure it's ready to go.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user risdenk commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/104#discussion_r97372920

          — Diff: lucene/ivy-versions.properties —
          @@ -36,7 +34,7 @@ com.google.inject.guice.version = 3.0
          /com.google.inject.extensions/guice-servlet = $

          {com.google.inject.guice.version}
          /com.google.inject/guice = ${com.google.inject.guice.version}

          -/com.google.protobuf/protobuf-java = 2.5.0
          +/com.google.protobuf/protobuf-java = 3.1.0
          — End diff –

          @markrmiller Can this be updated with the Hadoop support? Tests seem to pass but not sure if this affects protobuf compatibility with a running cluster.

          Show
          githubbot ASF GitHub Bot added a comment - Github user risdenk commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/104#discussion_r97372920 — Diff: lucene/ivy-versions.properties — @@ -36,7 +34,7 @@ com.google.inject.guice.version = 3.0 /com.google.inject.extensions/guice-servlet = $ {com.google.inject.guice.version} /com.google.inject/guice = ${com.google.inject.guice.version} -/com.google.protobuf/protobuf-java = 2.5.0 +/com.google.protobuf/protobuf-java = 3.1.0 — End diff – @markrmiller Can this be updated with the Hadoop support? Tests seem to pass but not sure if this affects protobuf compatibility with a running cluster.
          Hide
          risdenk Kevin Risden added a comment -

          Mike Drob Mark Miller Any thoughts on updating protobuf-java? Looks like it was included for the Hadoop support? The HDFS tests pass.

          Joel Bernstein Thoughts on getting this merged?

          Show
          risdenk Kevin Risden added a comment - Mike Drob Mark Miller Any thoughts on updating protobuf-java? Looks like it was included for the Hadoop support? The HDFS tests pass. Joel Bernstein Thoughts on getting this merged?
          Hide
          julianhyde Julian Hyde added a comment -

          If you have any "linking" issues with protobuf, you might check out HIVE-15708, which was caused because Hive used both avatica-core (which shades protobuf) and avatica (which does not).

          Show
          julianhyde Julian Hyde added a comment - If you have any "linking" issues with protobuf, you might check out HIVE-15708 , which was caused because Hive used both avatica-core (which shades protobuf) and avatica (which does not).
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Yep, I think it's time to put this in master. I haven't yet run the entire test suite on this branch though. Kevin Risden, how far have you gotten with tests and precommit? Any issues popping up?

          Show
          joel.bernstein Joel Bernstein added a comment - Yep, I think it's time to put this in master. I haven't yet run the entire test suite on this branch though. Kevin Risden , how far have you gotten with tests and precommit? Any issues popping up?
          Hide
          risdenk Kevin Risden added a comment -

          There is one test that I need to look at testDriverMetadata that depends on the order of schemas returned. I haven't checked precommit yet but I can run that.

          Show
          risdenk Kevin Risden added a comment - There is one test that I need to look at testDriverMetadata that depends on the order of schemas returned. I haven't checked precommit yet but I can run that.
          Hide
          risdenk Kevin Risden added a comment -

          There are a few precommit issues it looks like. I'll fix those.

          Show
          risdenk Kevin Risden added a comment - There are a few precommit issues it looks like. I'll fix those.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          I pulled the most recent work on this branch and precommit is passing. I'll be doing manual testing today and also running the full test suite. It seems like we are getting very close to committing this. I haven't seen the protobuf issue that Julian Hyde mentioned above. But I'll keep testing and see if anything pops up.

          Kevin Risden, any preference on who does the merge/commit to master on this?

          Show
          joel.bernstein Joel Bernstein added a comment - - edited I pulled the most recent work on this branch and precommit is passing. I'll be doing manual testing today and also running the full test suite. It seems like we are getting very close to committing this. I haven't seen the protobuf issue that Julian Hyde mentioned above. But I'll keep testing and see if anything pops up. Kevin Risden , any preference on who does the merge/commit to master on this?
          Hide
          risdenk Kevin Risden added a comment -

          We shouldn't see the issue that Julian Hyde raised since we are only using avatica-core. I want to take a look at the testDriverMetadata test where I see it fail once in a while due to order of localhost and metadata schemas being returned.

          I don't have a preference on who does the merge/commit to master. We should be able to do a squash merge of the branch back to master. I updated the branch once more to get the latest master changes.

          Show
          risdenk Kevin Risden added a comment - We shouldn't see the issue that Julian Hyde raised since we are only using avatica-core. I want to take a look at the testDriverMetadata test where I see it fail once in a while due to order of localhost and metadata schemas being returned. I don't have a preference on who does the merge/commit to master. We should be able to do a squash merge of the branch back to master. I updated the branch once more to get the latest master changes.
          Hide
          risdenk Kevin Risden added a comment -

          I need to fix this error:

          [beaster] > Throwable #1: org.junit.ComparisonFailure: expected:<[127.0.0.1:55994/solr]> but was:<[metadata]>
          [beaster] > at __randomizedtesting.SeedInfo.seed([987917FE113810C1:3B8CFDE8F2CD36A]:0)
          [beaster] > at org.junit.Assert.assertEquals(Assert.java:125)
          [beaster] > at org.junit.Assert.assertEquals(Assert.java:147)
          [beaster] > at org.apache.solr.client.solrj.io.sql.JdbcTest.testJDBCMethods(JdbcTest.java:507)

          Show
          risdenk Kevin Risden added a comment - I need to fix this error: [beaster] > Throwable #1: org.junit.ComparisonFailure: expected:< [127.0.0.1:55994/solr] > but was:< [metadata] > [beaster] > at __randomizedtesting.SeedInfo.seed( [987917FE113810C1:3B8CFDE8F2CD36A] :0) [beaster] > at org.junit.Assert.assertEquals(Assert.java:125) [beaster] > at org.junit.Assert.assertEquals(Assert.java:147) [beaster] > at org.apache.solr.client.solrj.io.sql.JdbcTest.testJDBCMethods(JdbcTest.java:507)
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Ok Kevin Risden, it sounds like you've got a good plan for the merge and commit. Why don't you do the actual merge and commit when we're ready. I'll check back in tomorrow with results from the manual testing. I haven't seen any test failures related to the Calcite work yet.

          Show
          joel.bernstein Joel Bernstein added a comment - Ok Kevin Risden , it sounds like you've got a good plan for the merge and commit. Why don't you do the actual merge and commit when we're ready. I'll check back in tomorrow with results from the manual testing. I haven't seen any test failures related to the Calcite work yet.
          Hide
          risdenk Kevin Risden added a comment -

          I pushed a fix for the testJDBCMethods error above.

          Joel Bernstein - Sounds good let me know when you are happy with your testing.

          Show
          risdenk Kevin Risden added a comment - I pushed a fix for the testJDBCMethods error above. Joel Bernstein - Sounds good let me know when you are happy with your testing.
          Hide
          risdenk Kevin Risden added a comment -

          For reference here is the latest patch.

          Show
          risdenk Kevin Risden added a comment - For reference here is the latest patch.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          We've run into a bug with pushing down of the SQL LIMIT clause. Currently we have code that handles the limit in the SolrSort class which extends org.apache.calcite.rel.core.Sort.

          The issue is that the SolrSort rule is only executed if an ORDER BY clause is included. So if LIMIT is used without an ORDER BY our code does not see the LIMIT.

          So limit works in this scenario:

          select a from b order by a limit 10

          but not this scenario:

          select a from b limit 10

          Julian Hyde, any ideas on how to resolve this?

          Here is the code where create the rule:
          https://github.com/apache/lucene-solr/blob/jira/solr-8593/solr/core/src/java/org/apache/solr/handler/sql/SolrRules.java#L184

          Show
          joel.bernstein Joel Bernstein added a comment - - edited We've run into a bug with pushing down of the SQL LIMIT clause. Currently we have code that handles the limit in the SolrSort class which extends org.apache.calcite.rel.core.Sort. The issue is that the SolrSort rule is only executed if an ORDER BY clause is included. So if LIMIT is used without an ORDER BY our code does not see the LIMIT. So limit works in this scenario: select a from b order by a limit 10 but not this scenario: select a from b limit 10 Julian Hyde , any ideas on how to resolve this? Here is the code where create the rule: https://github.com/apache/lucene-solr/blob/jira/solr-8593/solr/core/src/java/org/apache/solr/handler/sql/SolrRules.java#L184
          Hide
          julianhyde Julian Hyde added a comment - - edited

          Not sure I understand. The query select a from b limit 10 will have a Sort whose key has zero fields but which has fetch = 10. The Sort will be translated to a SolrSort with similar attributes. The sort is trivial - that is, you don't need to do any work to sort on 0 fields - but you do need to apply the limit. If you see a SolrSort with empty keys, don't drop it, but maybe convert into a SolrLimit if you have such a thing.

          You may be wondering why we combine sort and limit into the same operator. But remember that relational data sets are inherently unordered, so we have to do them at the same time. Sort with an empty key has reasonable semantics, just as – I hope you agree – Aggregate with an empty key (e.g. select count(*) from emp, which is equivalent to select count(*) from emp group by ()) is a reasonable generalization of Aggregate.

          Show
          julianhyde Julian Hyde added a comment - - edited Not sure I understand. The query select a from b limit 10 will have a Sort whose key has zero fields but which has fetch = 10. The Sort will be translated to a SolrSort with similar attributes. The sort is trivial - that is, you don't need to do any work to sort on 0 fields - but you do need to apply the limit. If you see a SolrSort with empty keys, don't drop it, but maybe convert into a SolrLimit if you have such a thing. You may be wondering why we combine sort and limit into the same operator. But remember that relational data sets are inherently unordered, so we have to do them at the same time. Sort with an empty key has reasonable semantics, just as – I hope you agree – Aggregate with an empty key (e.g. select count(*) from emp , which is equivalent to select count(*) from emp group by () ) is a reasonable generalization of Aggregate.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          I've been reviewing CALCITE-1235 which deals with pushing down the LIMIT in the Cassandra adapter.

          The problem that is reported is:

          To clarify, in the currently committed version of the adapter, pushing down sort + limit does not work because CassandraSortRule only ever seems to see a Sort without a fetch.

          I'm seeing a slightly different problem which is that the SolrSort isn't in the logical parse tree if the ORDER BY clause is removed. Without the SolrSort we don't have a way of getting the limit.

          The way this is handled in CALCITE-1235 is to create a rule that fires on EnumerableLimit. It seems to have worked as the ticket is committed. If we don't see another approach we'll have to go that route as well.

          Show
          joel.bernstein Joel Bernstein added a comment - I've been reviewing CALCITE-1235 which deals with pushing down the LIMIT in the Cassandra adapter. The problem that is reported is: To clarify, in the currently committed version of the adapter, pushing down sort + limit does not work because CassandraSortRule only ever seems to see a Sort without a fetch. I'm seeing a slightly different problem which is that the SolrSort isn't in the logical parse tree if the ORDER BY clause is removed. Without the SolrSort we don't have a way of getting the limit. The way this is handled in CALCITE-1235 is to create a rule that fires on EnumerableLimit. It seems to have worked as the ticket is committed. If we don't see another approach we'll have to go that route as well.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Hi Julian Hyde,

          The exact problem I'm seeing is that the SolrSort is not included in the query plan unless an ORDER BY is used in the query.

          With the ORDER BY our tree looks like this:

          org.apache.solr.handler.sql.SolrSort
          org.apache.solr.handler.sql.SolrProject
          org.apache.solr.handler.sql.SolrTableScan

          Without the ORDER BY our tree looks like this:

          org.apache.solr.handler.sql.SolrProject
          org.apache.solr.handler.sql.SolrTableScan

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Hi Julian Hyde , The exact problem I'm seeing is that the SolrSort is not included in the query plan unless an ORDER BY is used in the query. With the ORDER BY our tree looks like this: org.apache.solr.handler.sql.SolrSort org.apache.solr.handler.sql.SolrProject org.apache.solr.handler.sql.SolrTableScan Without the ORDER BY our tree looks like this: org.apache.solr.handler.sql.SolrProject org.apache.solr.handler.sql.SolrTableScan
          Hide
          julianhyde Julian Hyde added a comment -

          In the case where there is LIMIT but no ORDER BY, is a LogicalSort created? (There should be.) Is a SolrSort created, and its its offset field set (there should be)? If so, how/why does the SolrSort get dropped? (Does the planner find that it is equivalent to something cheaper? It shouldn't.)

          Show
          julianhyde Julian Hyde added a comment - In the case where there is LIMIT but no ORDER BY, is a LogicalSort created? (There should be.) Is a SolrSort created, and its its offset field set (there should be)? If so, how/why does the SolrSort get dropped? (Does the planner find that it is equivalent to something cheaper? It shouldn't.)
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Yes, the LogicalSort appears to be created because the SolrSortRule.convert does get called. But the resulting query plan doesn't include the SolrSort. It does appear that planner chooses not to include it.

          Show
          joel.bernstein Joel Bernstein added a comment - Yes, the LogicalSort appears to be created because the SolrSortRule.convert does get called. But the resulting query plan doesn't include the SolrSort. It does appear that planner chooses not to include it.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          I was able to get the SolrSort to be pushed down with no ORDER BY, by changing the SolrSort.computeSelfCost to:

          public RelOptCost computeSelfCost(RelOptPlanner planner, RelMetadataQuery mq) {
              return planner.getCostFactory().makeZeroCost();
           }
          

          It originally was:

           public RelOptCost computeSelfCost(RelOptPlanner planner, RelMetadataQuery mq) {
              return super.computeSelfCost(planner, mq).multiplyBy(0.05);
            }
          

          Julian Hyde, it appears that the planner was finding a lower cost option before the computeSelfCost was changed. Does it appear that this is a problem with how we were originally computing the cost? Or does this appear to be a bug in the planner. If it's a bug in the planner I'll log a ticket with the Calcite project.

          In either case I think the problem with LIMIT being pushed down is solved for this ticket. So I will move on and continue testing in preparation to commit.

          Show
          joel.bernstein Joel Bernstein added a comment - I was able to get the SolrSort to be pushed down with no ORDER BY, by changing the SolrSort.computeSelfCost to: public RelOptCost computeSelfCost(RelOptPlanner planner, RelMetadataQuery mq) { return planner.getCostFactory().makeZeroCost(); } It originally was: public RelOptCost computeSelfCost(RelOptPlanner planner, RelMetadataQuery mq) { return super .computeSelfCost(planner, mq).multiplyBy(0.05); } Julian Hyde , it appears that the planner was finding a lower cost option before the computeSelfCost was changed. Does it appear that this is a problem with how we were originally computing the cost? Or does this appear to be a bug in the planner. If it's a bug in the planner I'll log a ticket with the Calcite project. In either case I think the problem with LIMIT being pushed down is solved for this ticket. So I will move on and continue testing in preparation to commit.
          Hide
          julianhyde Julian Hyde added a comment -

          This shouldn't be due to cost differences. The plan without the sort (limit) is incorrect, so should never be chosen, regardless of cost.

          Show
          julianhyde Julian Hyde added a comment - This shouldn't be due to cost differences. The plan without the sort (limit) is incorrect, so should never be chosen, regardless of cost.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Currently working on pushing down the Having expression. I looked at possibly not implementing the Having push down but it turns out in certain scenarios if we don't push down having we don't produce a proper result.

          Show
          joel.bernstein Joel Bernstein added a comment - Currently working on pushing down the Having expression. I looked at possibly not implementing the Having push down but it turns out in certain scenarios if we don't push down having we don't produce a proper result.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          After adjusting the SolrFilterRule to allow filters on expressions, the having filter now shows up above the SolrAggregate:

          SolrFilter
          SolrAggregate
          SolrFilter
          SolrTableScan

          The first SolrFilter listed is the Having clause and second SolrFilter is the WHERE clause.

          The next step is to recognize the different types of filters and build a HavingStream for the having filter. I'll be working on plugging this in.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited After adjusting the SolrFilterRule to allow filters on expressions, the having filter now shows up above the SolrAggregate: SolrFilter SolrAggregate SolrFilter SolrTableScan The first SolrFilter listed is the Having clause and second SolrFilter is the WHERE clause. The next step is to recognize the different types of filters and build a HavingStream for the having filter. I'll be working on plugging this in.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          It was a bit of an odyssey but I was able to push down the HAVING clause. I pushed the commits out with my latest work to:
          https://github.com/apache/lucene-solr/tree/jira/solr-8593

          Show
          joel.bernstein Joel Bernstein added a comment - - edited It was a bit of an odyssey but I was able to push down the HAVING clause. I pushed the commits out with my latest work to: https://github.com/apache/lucene-solr/tree/jira/solr-8593
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Ok, I've pushed what I think are the final changes out to https://github.com/apache/lucene-solr/tree/jira/solr-8593.

          I believe we are ready to merge to master. The one complication with this is that when the branch is merged into master we're going to get compilation errors due to changes in SOLR-9916. These should hopefully be easy to fix.

          Kevin Risden, what do think is the best way is to go about the merge?

          Show
          joel.bernstein Joel Bernstein added a comment - Ok, I've pushed what I think are the final changes out to https://github.com/apache/lucene-solr/tree/jira/solr-8593 . I believe we are ready to merge to master. The one complication with this is that when the branch is merged into master we're going to get compilation errors due to changes in SOLR-9916 . These should hopefully be easy to fix. Kevin Risden , what do think is the best way is to go about the merge?
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Ok, I've got the calcite branch merged into master locally. I made a few small adjustments to line things up with SOLR-9916 and the SQL tests are passing. I'll start working through any issues that come up in the full test suite and pre-commit.

          Show
          joel.bernstein Joel Bernstein added a comment - Ok, I've got the calcite branch merged into master locally. I made a few small adjustments to line things up with SOLR-9916 and the SQL tests are passing. I'll start working through any issues that come up in the full test suite and pre-commit.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4ea97b08de1b853b1fc2a0f306db9a202e22db67 in lucene-solr's branch refs/heads/master from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4ea97b0 ]

          SOLR-8593: Ensure the SolrSort is always pushed down

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4ea97b08de1b853b1fc2a0f306db9a202e22db67 in lucene-solr's branch refs/heads/master from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4ea97b0 ] SOLR-8593 : Ensure the SolrSort is always pushed down
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit de512d7402024acb61917cacfd98e9aaaed4a456 in lucene-solr's branch refs/heads/master from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=de512d7 ]

          SOLR-8593: Push down the HAVING clause

          Show
          jira-bot ASF subversion and git services added a comment - Commit de512d7402024acb61917cacfd98e9aaaed4a456 in lucene-solr's branch refs/heads/master from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=de512d7 ] SOLR-8593 : Push down the HAVING clause
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ec6ee96ae6df1fdb2fffd881b45cb48670a10c5b in lucene-solr's branch refs/heads/master from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ec6ee96 ]

          SOLR-8593: Make SQL handler friendlier out of the box

          Show
          jira-bot ASF subversion and git services added a comment - Commit ec6ee96ae6df1fdb2fffd881b45cb48670a10c5b in lucene-solr's branch refs/heads/master from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ec6ee96 ] SOLR-8593 : Make SQL handler friendlier out of the box
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 12229b2ca04726c5db4ab381190fb247d2ec1084 in lucene-solr's branch refs/heads/master from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=12229b2 ]

          SOLR-8593: Switch to using the BooleanEvaluators

          Show
          jira-bot ASF subversion and git services added a comment - Commit 12229b2ca04726c5db4ab381190fb247d2ec1084 in lucene-solr's branch refs/heads/master from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=12229b2 ] SOLR-8593 : Switch to using the BooleanEvaluators
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit dcf41b9a8e3334fe0038f5b7a3be4549a03c72ce in lucene-solr's branch refs/heads/master from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dcf41b9 ]

          SOLR-8593: Fix precommit

          Show
          jira-bot ASF subversion and git services added a comment - Commit dcf41b9a8e3334fe0038f5b7a3be4549a03c72ce in lucene-solr's branch refs/heads/master from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dcf41b9 ] SOLR-8593 : Fix precommit
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/lucene-solr/pull/104

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/lucene-solr/pull/104
          Hide
          risdenk Kevin Risden added a comment -

          The merge looks like it went well. Tests haven't failed from what I've seen. Thanks Joel!

          Show
          risdenk Kevin Risden added a comment - The merge looks like it went well. Tests haven't failed from what I've seen. Thanks Joel!
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Yeah, I think the merge went smoothly. Awesome work on this ticket Kevin Risden and Cao Manh Dat!

          Show
          joel.bernstein Joel Bernstein added a comment - Yeah, I think the merge went smoothly. Awesome work on this ticket Kevin Risden and Cao Manh Dat !
          Hide
          risdenk Kevin Risden added a comment -

          Joel Bernstein - Thoughts on back porting this to branch_6x?

          Show
          risdenk Kevin Risden added a comment - Joel Bernstein - Thoughts on back porting this to branch_6x?
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          I was thinking about merging https://github.com/apache/lucene-solr/tree/jira/solr-8593 into branch_6x rather then cherry picking from master. There is one commit that will need to be reverted because it's only valid in master, but that should be fairly easy to do I think.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited I was thinking about merging https://github.com/apache/lucene-solr/tree/jira/solr-8593 into branch_6x rather then cherry picking from master. There is one commit that will need to be reverted because it's only valid in master, but that should be fairly easy to do I think.
          Hide
          risdenk Kevin Risden added a comment -

          Joel Bernstein - That sounds reasonable. Would be good to get the bulk of this into 6.5.

          Show
          risdenk Kevin Risden added a comment - Joel Bernstein - That sounds reasonable. Would be good to get the bulk of this into 6.5.
          Hide
          steve_rowe Steve Rowe added a comment -

          Policeman Jenkins found a reproducing seed for a TestSQLHandler failure https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/19059/:

            [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestSQLHandler -Dtests.method=doTest -Dtests.seed=F875A6E80D435C44 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=tr -Dtests.timezone=Africa/Lagos -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
            [junit4] ERROR   26.3s J0 | TestSQLHandler.doTest <<<
            [junit4]    > Throwable #1: java.io.IOException: --> https://127.0.0.1:37214/collection1:Failed to execute sqlQuery 'select str_s, count(*), sum(field_i), min(field_i), max(field_i), cast(avg(1.0 * field_i) as float) from collection1 where text='XXXX' group by str_s order by sum(field_i) asc limit 2' against JDBC connection 'jdbc:calcitesolr:'.
            [junit4]    > Error while executing SQL "select str_s, count(*), sum(field_i), min(field_i), max(field_i), cast(avg(1.0 * field_i) as float) from collection1 where text='XXXX' group by str_s order by sum(field_i) asc limit 2": From line 1, column 39 to line 1, column 50: No match found for function signature min(<NUMERIC>)
            [junit4]    > 	at __randomizedtesting.SeedInfo.seed([F875A6E80D435C44:5F311E4C60F84FFD]:0)
            [junit4]    > 	at org.apache.solr.client.solrj.io.stream.SolrStream.read(SolrStream.java:235)
            [junit4]    > 	at org.apache.solr.handler.TestSQLHandler.getTuples(TestSQLHandler.java:2325)
            [junit4]    > 	at org.apache.solr.handler.TestSQLHandler.testBasicGrouping(TestSQLHandler.java:651)
            [junit4]    > 	at org.apache.solr.handler.TestSQLHandler.doTest(TestSQLHandler.java:77)
            [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:985)
            [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:960)
          [...]
            [junit4]   2> NOTE: test params are: codec=Asserting(Lucene70): {n_l1=PostingsFormat(name=LuceneFixedGap), multiDefault=BlockTreeOrds(blocksize=128), Field_i=BlockTreeOrds(blocksize=128), intDefault=PostingsFormat(name=LuceneFixedGap), n_dt1=BlockTreeOrds(blocksize=128), n_td1=BlockTreeOrds(blocksize=128), Str_s=Lucene50(blocksize=128), n_d1=PostingsFormat(name=LuceneFixedGap), field_i=BlockTreeOrds(blocksize=128), str_s=Lucene50(blocksize=128), n_f1=BlockTreeOrds(blocksize=128), n_ti1=FST50, rnd_b=FST50, n_tl1=BlockTreeOrds(blocksize=128), _version_=PostingsFormat(name=LuceneFixedGap), n_tf1=PostingsFormat(name=LuceneFixedGap), n_tdt1=PostingsFormat(name=LuceneFixedGap), id=FST50, text=Lucene50(blocksize=128), Text_t=PostingsFormat(name=LuceneFixedGap), timestamp=PostingsFormat(name=LuceneFixedGap)}, docValues:{_version_=DocValuesFormat(name=Lucene70), multiDefault=DocValuesFormat(name=Asserting), Field_i=DocValuesFormat(name=Asserting), intDefault=DocValuesFormat(name=Lucene70), id=DocValuesFormat(name=Memory), Str_s=DocValuesFormat(name=Direct), field_i=DocValuesFormat(name=Asserting), str_s=DocValuesFormat(name=Direct), n_f1=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=197, maxMBSortInHeap=6.686785692870154, sim=RandomSimilarity(queryNorm=false): {}, locale=tr, timezone=Africa/Lagos
            [junit4]   2> NOTE: Linux 4.4.0-53-generic i386/Oracle Corporation 1.8.0_121 (32-bit)/cpus=12,threads=1,free=50683592,total=254599168
          
          Show
          steve_rowe Steve Rowe added a comment - Policeman Jenkins found a reproducing seed for a TestSQLHandler failure https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/19059/ : [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestSQLHandler -Dtests.method=doTest -Dtests.seed=F875A6E80D435C44 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=tr -Dtests.timezone=Africa/Lagos -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] ERROR 26.3s J0 | TestSQLHandler.doTest <<< [junit4] > Throwable #1: java.io.IOException: --> https://127.0.0.1:37214/collection1:Failed to execute sqlQuery 'select str_s, count(*), sum(field_i), min(field_i), max(field_i), cast(avg(1.0 * field_i) as float) from collection1 where text='XXXX' group by str_s order by sum(field_i) asc limit 2' against JDBC connection 'jdbc:calcitesolr:'. [junit4] > Error while executing SQL "select str_s, count(*), sum(field_i), min(field_i), max(field_i), cast(avg(1.0 * field_i) as float) from collection1 where text='XXXX' group by str_s order by sum(field_i) asc limit 2": From line 1, column 39 to line 1, column 50: No match found for function signature min(<NUMERIC>) [junit4] > at __randomizedtesting.SeedInfo.seed([F875A6E80D435C44:5F311E4C60F84FFD]:0) [junit4] > at org.apache.solr.client.solrj.io.stream.SolrStream.read(SolrStream.java:235) [junit4] > at org.apache.solr.handler.TestSQLHandler.getTuples(TestSQLHandler.java:2325) [junit4] > at org.apache.solr.handler.TestSQLHandler.testBasicGrouping(TestSQLHandler.java:651) [junit4] > at org.apache.solr.handler.TestSQLHandler.doTest(TestSQLHandler.java:77) [junit4] > at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:985) [junit4] > at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:960) [...] [junit4] 2> NOTE: test params are: codec=Asserting(Lucene70): {n_l1=PostingsFormat(name=LuceneFixedGap), multiDefault=BlockTreeOrds(blocksize=128), Field_i=BlockTreeOrds(blocksize=128), intDefault=PostingsFormat(name=LuceneFixedGap), n_dt1=BlockTreeOrds(blocksize=128), n_td1=BlockTreeOrds(blocksize=128), Str_s=Lucene50(blocksize=128), n_d1=PostingsFormat(name=LuceneFixedGap), field_i=BlockTreeOrds(blocksize=128), str_s=Lucene50(blocksize=128), n_f1=BlockTreeOrds(blocksize=128), n_ti1=FST50, rnd_b=FST50, n_tl1=BlockTreeOrds(blocksize=128), _version_=PostingsFormat(name=LuceneFixedGap), n_tf1=PostingsFormat(name=LuceneFixedGap), n_tdt1=PostingsFormat(name=LuceneFixedGap), id=FST50, text=Lucene50(blocksize=128), Text_t=PostingsFormat(name=LuceneFixedGap), timestamp=PostingsFormat(name=LuceneFixedGap)}, docValues:{_version_=DocValuesFormat(name=Lucene70), multiDefault=DocValuesFormat(name=Asserting), Field_i=DocValuesFormat(name=Asserting), intDefault=DocValuesFormat(name=Lucene70), id=DocValuesFormat(name=Memory), Str_s=DocValuesFormat(name=Direct), field_i=DocValuesFormat(name=Asserting), str_s=DocValuesFormat(name=Direct), n_f1=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=197, maxMBSortInHeap=6.686785692870154, sim=RandomSimilarity(queryNorm=false): {}, locale=tr, timezone=Africa/Lagos [junit4] 2> NOTE: Linux 4.4.0-53-generic i386/Oracle Corporation 1.8.0_121 (32-bit)/cpus=12,threads=1,free=50683592,total=254599168
          Hide
          joel.bernstein Joel Bernstein added a comment -

          OK, I'll take a look.

          Show
          joel.bernstein Joel Bernstein added a comment - OK, I'll take a look.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          It appears that this issue is due to the Turkish locale.

          https://garygregory.wordpress.com/2015/11/03/java-lowercase-conversion-turkey/

          I believe this is due to how Calcite is handling locales. Kevin Risden, curious if you see the same thing, which is that this in the Calcite code.

          Steve Rowe, if it does turn out this is a Calcite issue I'll log an issue on the Calcite site and submit a patch. In the meantime is there a way to suppress the Turkish locale in the tests?

          Show
          joel.bernstein Joel Bernstein added a comment - - edited It appears that this issue is due to the Turkish locale. https://garygregory.wordpress.com/2015/11/03/java-lowercase-conversion-turkey/ I believe this is due to how Calcite is handling locales. Kevin Risden , curious if you see the same thing, which is that this in the Calcite code. Steve Rowe , if it does turn out this is a Calcite issue I'll log an issue on the Calcite site and submit a patch. In the meantime is there a way to suppress the Turkish locale in the tests?
          Hide
          julianhyde Julian Hyde added a comment -

          Oh wow. I18n never fails to surprise. Please log a Calcite issue. We should ensure that Calcite runs correctly if user.locale=tr.

          Show
          julianhyde Julian Hyde added a comment - Oh wow. I18n never fails to surprise. Please log a Calcite issue. We should ensure that Calcite runs correctly if user.locale=tr .
          Hide
          risdenk Kevin Risden added a comment -

          Joel Bernstein - I had the same suspicion about the turkish locale. I know that Lucene uses Forbidden APIs to prevent some of the Java constructs that allow for bad locale handling.

          Show
          risdenk Kevin Risden added a comment - Joel Bernstein - I had the same suspicion about the turkish locale. I know that Lucene uses Forbidden APIs to prevent some of the Java constructs that allow for bad locale handling.
          Hide
          steve_rowe Steve Rowe added a comment -

          In the meantime is there a way to suppress the Turkish locale in the tests?

          Yes, see for example:

          solr/contrib/map-reduce/src/test/org/apache/solr/hadoop/MRUnitBase.java
            @BeforeClass
            public static void setupClass() throws Exception {
          [...]
              assumeFalse("This test fails on UNIX with Turkish default locale (https://issues.apache.org/jira/browse/SOLR-6387)",
                  new Locale("tr").getLanguage().equals(Locale.getDefault().getLanguage()));
          
          Show
          steve_rowe Steve Rowe added a comment - In the meantime is there a way to suppress the Turkish locale in the tests? Yes, see for example: solr/contrib/map-reduce/src/test/org/apache/solr/hadoop/MRUnitBase.java @BeforeClass public static void setupClass() throws Exception { [...] assumeFalse( "This test fails on UNIX with Turkish default locale (https: //issues.apache.org/jira/browse/SOLR-6387)" , new Locale( "tr" ).getLanguage().equals(Locale.getDefault().getLanguage()));
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Ok great, I will add this assumption to the SQL tests.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Ok great, I will add this assumption to the SQL tests.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Also working through the backport to branch_6x of this ticket.

          Show
          joel.bernstein Joel Bernstein added a comment - Also working through the backport to branch_6x of this ticket.
          Hide
          risdenk Kevin Risden added a comment -

          Joel Bernstein & Julian Hyde - I tried to set user.locale=tr for the Calcite and Avatica proper tests and couldn't get it to fail. I haven't been able to isolate the issue. The few things I tried from the Calcite repo:

          MAVEN_OPTS="$MAVEN_OPTS -Duser.locale=tr" mvn clean test
          Add -Duser.locale=tr to the surefire argLinein pom.xml
          

          Neither of the above made tests in either Calcite or Avatica fail. Not sure where to go from here.

          Show
          risdenk Kevin Risden added a comment - Joel Bernstein & Julian Hyde - I tried to set user.locale=tr for the Calcite and Avatica proper tests and couldn't get it to fail. I haven't been able to isolate the issue. The few things I tried from the Calcite repo: MAVEN_OPTS= "$MAVEN_OPTS -Duser.locale=tr" mvn clean test Add -Duser.locale=tr to the surefire argLinein pom.xml Neither of the above made tests in either Calcite or Avatica fail. Not sure where to go from here.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          The backport is turning out be trickier then I thought.

          The reason is that jira/solr-8593 has lots of master commits in it that are not in branch_6x. So merging is not going to work.

          Also getting an isolated list of the 80+ commits done in jira/solr-8593 for the Calcite integration is tricky because the original pull request is closed. Creating a new pull request against branch_6x includes all the master commits.

          So I'm looking at ways to isolate all the commits to cherry-pick.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited The backport is turning out be trickier then I thought. The reason is that jira/solr-8593 has lots of master commits in it that are not in branch_6x. So merging is not going to work. Also getting an isolated list of the 80+ commits done in jira/solr-8593 for the Calcite integration is tricky because the original pull request is closed. Creating a new pull request against branch_6x includes all the master commits. So I'm looking at ways to isolate all the commits to cherry-pick.
          Hide
          risdenk Kevin Risden added a comment -

          You can generate a patch from the jira/solr-8593 branch (if you have it locally) and then just apply the patch to branch_6x. That would avoid the merge and all the other commits.

          Show
          risdenk Kevin Risden added a comment - You can generate a patch from the jira/solr-8593 branch (if you have it locally) and then just apply the patch to branch_6x. That would avoid the merge and all the other commits.
          Hide
          julianhyde Julian Hyde added a comment - - edited

          Kevin Risden, Regarding the Turkish locale issue. We have to explicitly pass user.timezone from maven into surefire (see pom.xml), so I suspect we'd have to do the same with the locale. Can you log a Calcite case please? Even if we can't reproduce, I'd rather that we tracked it.

          Show
          julianhyde Julian Hyde added a comment - - edited Kevin Risden , Regarding the Turkish locale issue. We have to explicitly pass user.timezone from maven into surefire (see pom.xml ), so I suspect we'd have to do the same with the locale. Can you log a Calcite case please? Even if we can't reproduce, I'd rather that we tracked it.
          Hide
          risdenk Kevin Risden added a comment -

          Julian Hyde - I opened CALCITE-1667 for the turkish locale. I had tried the explicit passing like the user.timezone and that didn't help either.

          Show
          risdenk Kevin Risden added a comment - Julian Hyde - I opened CALCITE-1667 for the turkish locale. I had tried the explicit passing like the user.timezone and that didn't help either.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit eb43938e0f759089ed767ab27414457f4d475588 in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=eb43938 ]

          SOLR-8593: Switch to using the BooleanEvaluators

          Show
          jira-bot ASF subversion and git services added a comment - Commit eb43938e0f759089ed767ab27414457f4d475588 in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=eb43938 ] SOLR-8593 : Switch to using the BooleanEvaluators
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1b919736982fe09cfacf5f08f7f9674cfed059d1 in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1b91973 ]

          SOLR-8593: Fix precommit

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1b919736982fe09cfacf5f08f7f9674cfed059d1 in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1b91973 ] SOLR-8593 : Fix precommit
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          This has now been backported for release in Solr 6.5.

          The jira number was not on the first backport commit so I'll link to it here:
          https://github.com/apache/lucene-solr/commit/3370dbed2e3e247a40012ab76aca059d640dfc80
          This is a squash of the commits from master. Master still has the commit history from the feature branch. This squashed commit will much easier to revert if need be.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited This has now been backported for release in Solr 6.5. The jira number was not on the first backport commit so I'll link to it here: https://github.com/apache/lucene-solr/commit/3370dbed2e3e247a40012ab76aca059d640dfc80 This is a squash of the commits from master. Master still has the commit history from the feature branch. This squashed commit will much easier to revert if need be.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5ae51d4ddf5e1a27b8f1741910a32697f952f482 in lucene-solr's branch refs/heads/master from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5ae51d4 ]

          SOLR-8593: Update CHANGES.txt

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5ae51d4ddf5e1a27b8f1741910a32697f952f482 in lucene-solr's branch refs/heads/master from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5ae51d4 ] SOLR-8593 : Update CHANGES.txt
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 12e7634901e168761592f592415dcd571c15f389 in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=12e7634 ]

          SOLR-8593: Update CHANGES.txt

          Show
          jira-bot ASF subversion and git services added a comment - Commit 12e7634901e168761592f592415dcd571c15f389 in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=12e7634 ] SOLR-8593 : Update CHANGES.txt
          Hide
          steve_rowe Steve Rowe added a comment -

          After the backport, my Jenkins is now seeing the Turkish locale problem on branch_6x:

            [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestSQLHandler -Dtests.method=doTest -Dtests.seed=C8349150BF686397 -Dtests.slow=true -Dtests.locale=tr -Dtests.timezone=America/Indiana/Tell_City -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
            [junit4] ERROR   40.1s J7  | TestSQLHandler.doTest <<<
            [junit4]    > Throwable #1: java.io.IOException: --> http://127.0.0.1:52800/a_vcs/qk/collection1:Failed to execute sqlQuery 'select str_s, count(*), sum(field_i), min(field_i), max(field_i), cast(avg(1.0 * field_i) as float) from collection1 where text='XXXX' group by str_s order by sum(field_i) asc limit 2' against JDBC connection 'jdbc:calcitesolr:'.
            [junit4]    > Error while executing SQL "select str_s, count(*), sum(field_i), min(field_i), max(field_i), cast(avg(1.0 * field_i) as float) from collection1 where text='XXXX' group by str_s order by sum(field_i) asc limit 2": From line 1, column 39 to line 1, column 50: No match found for function signature min(<NUMERIC>)
            [junit4]    > 	at __randomizedtesting.SeedInfo.seed([C8349150BF686397:6F7029F4D2D3702E]:0)
            [junit4]    > 	at org.apache.solr.client.solrj.io.stream.SolrStream.read(SolrStream.java:235)
            [junit4]    > 	at org.apache.solr.handler.TestSQLHandler.getTuples(TestSQLHandler.java:2325)
            [junit4]    > 	at org.apache.solr.handler.TestSQLHandler.testBasicGrouping(TestSQLHandler.java:651)
            [junit4]    > 	at org.apache.solr.handler.TestSQLHandler.doTest(TestSQLHandler.java:77)
            [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:992)
            [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:967)
          [...]
            [junit4]   2> NOTE: test params are: codec=Lucene62, sim=RandomSimilarity(queryNorm=false,coord=no): {}, locale=tr, timezone=America/Indiana/Tell_City
            [junit4]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_77 (64-bit)/cpus=16,threads=1,free=217047872,total=524812288
          
          Show
          steve_rowe Steve Rowe added a comment - After the backport, my Jenkins is now seeing the Turkish locale problem on branch_6x: [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestSQLHandler -Dtests.method=doTest -Dtests.seed=C8349150BF686397 -Dtests.slow=true -Dtests.locale=tr -Dtests.timezone=America/Indiana/Tell_City -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] ERROR 40.1s J7 | TestSQLHandler.doTest <<< [junit4] > Throwable #1: java.io.IOException: --> http://127.0.0.1:52800/a_vcs/qk/collection1:Failed to execute sqlQuery 'select str_s, count(*), sum(field_i), min(field_i), max(field_i), cast(avg(1.0 * field_i) as float) from collection1 where text='XXXX' group by str_s order by sum(field_i) asc limit 2' against JDBC connection 'jdbc:calcitesolr:'. [junit4] > Error while executing SQL "select str_s, count(*), sum(field_i), min(field_i), max(field_i), cast(avg(1.0 * field_i) as float) from collection1 where text='XXXX' group by str_s order by sum(field_i) asc limit 2": From line 1, column 39 to line 1, column 50: No match found for function signature min(<NUMERIC>) [junit4] > at __randomizedtesting.SeedInfo.seed([C8349150BF686397:6F7029F4D2D3702E]:0) [junit4] > at org.apache.solr.client.solrj.io.stream.SolrStream.read(SolrStream.java:235) [junit4] > at org.apache.solr.handler.TestSQLHandler.getTuples(TestSQLHandler.java:2325) [junit4] > at org.apache.solr.handler.TestSQLHandler.testBasicGrouping(TestSQLHandler.java:651) [junit4] > at org.apache.solr.handler.TestSQLHandler.doTest(TestSQLHandler.java:77) [junit4] > at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:992) [junit4] > at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:967) [...] [junit4] 2> NOTE: test params are: codec=Lucene62, sim=RandomSimilarity(queryNorm=false,coord=no): {}, locale=tr, timezone=America/Indiana/Tell_City [junit4] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_77 (64-bit)/cpus=16,threads=1,free=217047872,total=524812288
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Ok, I left this ticket open until I fixed this. I'll tackle this shortly.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Ok, I left this ticket open until I fixed this. I'll tackle this shortly.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 6df17c8cfe72d229140fb644d067a50cd7a2b455 in lucene-solr's branch refs/heads/master from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6df17c8 ]

          SOLR-8593: in TestSQLHandler assume not run with Turkish locale

          Show
          jira-bot ASF subversion and git services added a comment - Commit 6df17c8cfe72d229140fb644d067a50cd7a2b455 in lucene-solr's branch refs/heads/master from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6df17c8 ] SOLR-8593 : in TestSQLHandler assume not run with Turkish locale
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4b1a16361d85710289e2905c1a796dba6ac858ec in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4b1a163 ]

          SOLR-8593: in TestSQLHandler assume not run with Turkish locale

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4b1a16361d85710289e2905c1a796dba6ac858ec in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4b1a163 ] SOLR-8593 : in TestSQLHandler assume not run with Turkish locale
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Ok, I added the assumption for the Turkish locale. I'm planning on resolving this ticket. If other issues come up with the Apache Calcite integration we can open up a new issue.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Ok, I added the assumption for the Turkish locale. I'm planning on resolving this ticket. If other issues come up with the Apache Calcite integration we can open up a new issue.

            People

            • Assignee:
              joel.bernstein Joel Bernstein
              Reporter:
              joel.bernstein Joel Bernstein
            • Votes:
              2 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development