Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8002

Add column alias support to the Parallel SQL Interface

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 6.0
    • Fix Version/s: 6.0
    • Component/s: search
    • Labels:
      None

      Description

      Currently field aliases are not supported for SQL queries against SQL Handler. E.g. below SQL query

      select id,name as product_name from techproducts limit 20

      currently fails as data returned contains still "name" as the field/column key than product_name

      1. SOLR-8002.patch
        26 kB
        Joel Bernstein
      2. SOLR-8002.patch
        25 kB
        Joel Bernstein

        Issue Links

          Activity

          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          This is actually a new feature request. The SQL interface doesn't yet support field aliases.

          Also, Susheel Kumar patches are welcome!

          I'll be happy to review any patches in this area.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited This is actually a new feature request. The SQL interface doesn't yet support field aliases. Also, Susheel Kumar patches are welcome! I'll be happy to review any patches in this area.
          Hide
          dpgove Dennis Gove added a comment -

          SOLR-7669 adds a SelectStream to the streaming and expression apis. It includes support for both aliases and operations and might be a good basis for adding these features to the SQL api.

          Show
          dpgove Dennis Gove added a comment - SOLR-7669 adds a SelectStream to the streaming and expression apis. It includes support for both aliases and operations and might be a good basis for adding these features to the SQL api.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          This makes sense. I haven't had a chance to review SOLR-7669 yet.

          Susheel Kumar, if you're working on a patch you might want to see if you can work with the SelectStream. You'll need to apply the patch from SOLR-7669 as it's not yet in trunk.

          Show
          joel.bernstein Joel Bernstein added a comment - This makes sense. I haven't had a chance to review SOLR-7669 yet. Susheel Kumar , if you're working on a patch you might want to see if you can work with the SelectStream. You'll need to apply the patch from SOLR-7669 as it's not yet in trunk.
          Hide
          susheel2777@gmail.com Susheel Kumar added a comment -

          Sure, Joel. Let me start looking into the patch SOLR-7669.

          Show
          susheel2777@gmail.com Susheel Kumar added a comment - Sure, Joel. Let me start looking into the patch SOLR-7669 .
          Hide
          susheel2777@gmail.com Susheel Kumar added a comment - - edited

          Hi Joel, Davis,

          Just to clarify that utilizing the SelectStream in SQL api (SQLHandler.java) would require transforming the SQL expression into SOLR streaming expressions for SelectStream to work. So for e.g. SQL expression

          select id, field_i, str_s from collection1 where text='XXXX' order by field_i desc

          would be transformed to Solr Streaming expression

          search(collection1, q="text:XXXX", fl="id,field_i,str_s", sort="field_i desc")

          Please let me know your thoughts & if that is the correct understanding.

          Thanks,
          Susheel

          Show
          susheel2777@gmail.com Susheel Kumar added a comment - - edited Hi Joel, Davis, Just to clarify that utilizing the SelectStream in SQL api (SQLHandler.java) would require transforming the SQL expression into SOLR streaming expressions for SelectStream to work. So for e.g. SQL expression select id, field_i, str_s from collection1 where text='XXXX' order by field_i desc would be transformed to Solr Streaming expression search(collection1, q="text:XXXX", fl="id,field_i,str_s", sort="field_i desc") Please let me know your thoughts & if that is the correct understanding. Thanks, Susheel
          Hide
          dpgove Dennis Gove added a comment -

          Before I give my thoughts on this I do want to just put into writing an important note that I think will help frame the conversation.

          None of the Streaming classes require the use of streaming expressions. Streaming Expressions is just a way to turn a human readable expression into valid streaming constructs. You can, if you want, just create instances of streaming classes without ever thinking about expressions. The SQL api (in SQLHandler.java) currently works this way.

          That said, I do feel that expressions are the easiest and clearest way to interact with Streams. They provide a very concrete yet expressive way to turn a streaming query into objects that do all the actual work. And they are bi-directional in that you can turn a set of stream objects into an expression as easily as you can turn an expression into a set of stream objects. From a user's perspective I find expressions easier to understand than even standard Solr queries, particularly when performing things like aggregations, top, and joins. I can look at an expression and know exactly what I should expect to receive as a result.

          I'm not adverse to the your suggestion that we create a pipeline of SQL Statement -> Expression -> Stream object. That said, I don't think it would be good idea (from a performance perspective) to create a streaming expression in string format. Ie, "SELECT fieldA FROM collection1 WHERE fieldB = 1" -> search(collection1, fl="fieldA", q="fieldB:1"). I would instead suggest that we turn a SQL statement into a StreamExpression object. This would remove the need to reparse a string just to end up with a StreamExpression object. For example, the above SQL statement could be turned into a StreamExpression with the code

            StreamExpression expression = new StreamExpression("search")
                .withParameter(new StreamExpressionValue("collection1"))
                .withParameter(new StreamExpressionNamedParameter("fl","fieldA"))
                .withParameter(new StreamExpressionNamedParameter("q","fieldB:1"));
          

          which can then be turned into a Stream with

            TupleStream stream = streamFactory.constructStream(expression);
          

          Thinking about this wrt the string representation of an expression, we really end up with a rather clear pipeline of [SQL Statement | String Expression] -> StreamExpression -> TupleStream. This pipeline makes it very clear that whatever your input format is you only need to convert it into a StreamExpression object and that's it - all the other work of creating a stream from a StreamExpression object is already done.

          I think this is the correct approach.

          Show
          dpgove Dennis Gove added a comment - Before I give my thoughts on this I do want to just put into writing an important note that I think will help frame the conversation. None of the Streaming classes require the use of streaming expressions. Streaming Expressions is just a way to turn a human readable expression into valid streaming constructs. You can, if you want, just create instances of streaming classes without ever thinking about expressions. The SQL api (in SQLHandler.java) currently works this way. That said, I do feel that expressions are the easiest and clearest way to interact with Streams. They provide a very concrete yet expressive way to turn a streaming query into objects that do all the actual work. And they are bi-directional in that you can turn a set of stream objects into an expression as easily as you can turn an expression into a set of stream objects. From a user's perspective I find expressions easier to understand than even standard Solr queries, particularly when performing things like aggregations, top, and joins. I can look at an expression and know exactly what I should expect to receive as a result. I'm not adverse to the your suggestion that we create a pipeline of SQL Statement -> Expression -> Stream object. That said, I don't think it would be good idea (from a performance perspective) to create a streaming expression in string format. Ie, "SELECT fieldA FROM collection1 WHERE fieldB = 1" -> search(collection1, fl="fieldA", q="fieldB:1"). I would instead suggest that we turn a SQL statement into a StreamExpression object. This would remove the need to reparse a string just to end up with a StreamExpression object. For example, the above SQL statement could be turned into a StreamExpression with the code StreamExpression expression = new StreamExpression( "search" ) .withParameter( new StreamExpressionValue( "collection1" )) .withParameter( new StreamExpressionNamedParameter( "fl" , "fieldA" )) .withParameter( new StreamExpressionNamedParameter( "q" , "fieldB:1" )); which can then be turned into a Stream with TupleStream stream = streamFactory.constructStream(expression); Thinking about this wrt the string representation of an expression, we really end up with a rather clear pipeline of [SQL Statement | String Expression] -> StreamExpression -> TupleStream. This pipeline makes it very clear that whatever your input format is you only need to convert it into a StreamExpression object and that's it - all the other work of creating a stream from a StreamExpression object is already done. I think this is the correct approach.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Actually the SQLHandler serializes to a StreamingExpression when the Streaming API objects are sent to worker nodes. As a general rule I think that all built-in Solr features that use the Streaming API should serialize to StreamingExpressions for parallel operations, because it's extremely efficient.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Actually the SQLHandler serializes to a StreamingExpression when the Streaming API objects are sent to worker nodes. As a general rule I think that all built-in Solr features that use the Streaming API should serialize to StreamingExpressions for parallel operations, because it's extremely efficient.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Susheel Kumar, this ticket is actually pretty tricky to get started with, especially because SOLR-7903 is very close to being committed. Once SOLR-7903 is committed it will also need field alias support.

          My gut feeling is that it would be more effective to work on SOLR-7986. The other nice thing about working on the JDBC driver is that it will provide another set of overlapping tests of the SQLHandler going into the Solr 6 release.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Susheel Kumar , this ticket is actually pretty tricky to get started with, especially because SOLR-7903 is very close to being committed. Once SOLR-7903 is committed it will also need field alias support. My gut feeling is that it would be more effective to work on SOLR-7986 . The other nice thing about working on the JDBC driver is that it will provide another set of overlapping tests of the SQLHandler going into the Solr 6 release.
          Hide
          dpgove Dennis Gove added a comment -

          I was thinking more in the initial parsing of the SQL statement. At that point perhaps create a StreamExpression object. If I'm reading the code correctly it appears to go directly from a SQL statement to a Stream object.

          Show
          dpgove Dennis Gove added a comment - I was thinking more in the initial parsing of the SQL statement. At that point perhaps create a StreamExpression object. If I'm reading the code correctly it appears to go directly from a SQL statement to a Stream object.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Ah, just read your post more closely. Building up a Streaming Expression first rather then going directly to the Streaming API object is an interesting approach that I hadn't thought of. I'll take a look at how the SQLHandler is doing things and see what that would look like building up a Streaming Expression directly.

          Show
          joel.bernstein Joel Bernstein added a comment - Ah, just read your post more closely. Building up a Streaming Expression first rather then going directly to the Streaming API object is an interesting approach that I hadn't thought of. I'll take a look at how the SQLHandler is doing things and see what that would look like building up a Streaming Expression directly.
          Hide
          susheel2777@gmail.com Susheel Kumar added a comment -

          Thanks, Dennis for the explanation & I agree with the approach that either SQL statement or String Expression would be converted to Stream Expression object.

          Show
          susheel2777@gmail.com Susheel Kumar added a comment - Thanks, Dennis for the explanation & I agree with the approach that either SQL statement or String Expression would be converted to Stream Expression object.
          Hide
          susheel2777@gmail.com Susheel Kumar added a comment -

          Agree, Joel that this is tricky to start. I'll anyway continue with more tests and will look into SOLR-7986 as well.

          Show
          susheel2777@gmail.com Susheel Kumar added a comment - Agree, Joel that this is tricky to start. I'll anyway continue with more tests and will look into SOLR-7986 as well.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          I've been mulling over the best way to implement alias support. Here is one possibility:

          1) During the parsing collect the field > alias map and alias>field map. We'll need both directions for the implementation. Attach these maps to the SQLVisitor.
          2) Wrap the outer most TupleStream in a SelectStream that includes field -> alias translation. This will take care of outputing the column aliases in the Tuple.
          3) Add reverse alias translation to the ORDER BY, GROUP BY, and HAVING clauses. This will support using column alias's in these clauses.

          Show
          joel.bernstein Joel Bernstein added a comment - I've been mulling over the best way to implement alias support. Here is one possibility: 1) During the parsing collect the field > alias map and alias >field map. We'll need both directions for the implementation. Attach these maps to the SQLVisitor. 2) Wrap the outer most TupleStream in a SelectStream that includes field -> alias translation. This will take care of outputing the column aliases in the Tuple. 3) Add reverse alias translation to the ORDER BY, GROUP BY, and HAVING clauses. This will support using column alias's in these clauses.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Patch with full alias support and tests. Aliases are supported for columns and functions and can be used in GROUP BY, ORDER BY and HAVING clauses.

          Aliases are not supported in the WHERE clause which is consistent with the SQL spec.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Patch with full alias support and tests. Aliases are supported for columns and functions and can be used in GROUP BY, ORDER BY and HAVING clauses. Aliases are not supported in the WHERE clause which is consistent with the SQL spec.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Added a test for aliases when running aggregates without grouping.

          Show
          joel.bernstein Joel Bernstein added a comment - Added a test for aliases when running aggregates without grouping.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1724319 from Joel Bernstein in branch 'dev/trunk'
          [ https://svn.apache.org/r1724319 ]

          SOLR-8002: Add column alias support to the Parallel SQL Interface

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1724319 from Joel Bernstein in branch 'dev/trunk' [ https://svn.apache.org/r1724319 ] SOLR-8002 : Add column alias support to the Parallel SQL Interface
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1724323 from Joel Bernstein in branch 'dev/trunk'
          [ https://svn.apache.org/r1724323 ]

          SOLR-8002: Added more test cases

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1724323 from Joel Bernstein in branch 'dev/trunk' [ https://svn.apache.org/r1724323 ] SOLR-8002 : Added more test cases
          Hide
          risdenk Kevin Risden added a comment -

          Joel Bernstein Any reason to not always return a SelectStream instead of checking if there are columnAliases?

          I'm thinking about the JDBC implementation of get*() by index. The SelectStream can guarantee the tuples have the fields in the same order as the SQL statement. For example: select a, b, c from table; will result in tuple.get(0) being a, tuple.get(1) = b, etc. Without the SelectStream wrapping, the fields are in an arbitrary order.

          In addition to wrapping the result in SelectStream, this also requires that the columnAliases, reverseColumnAliases, and tuple.fields are LinkedHashMap instead of just plain HashMap to keep the keys in insertion order. I have a rough patch that has these changes just wasn't sure if it made sense to add to this or to a separate JIRA.

          Show
          risdenk Kevin Risden added a comment - Joel Bernstein Any reason to not always return a SelectStream instead of checking if there are columnAliases? I'm thinking about the JDBC implementation of get*() by index. The SelectStream can guarantee the tuples have the fields in the same order as the SQL statement. For example: select a, b, c from table; will result in tuple.get(0) being a, tuple.get(1) = b, etc. Without the SelectStream wrapping, the fields are in an arbitrary order. In addition to wrapping the result in SelectStream, this also requires that the columnAliases, reverseColumnAliases, and tuple.fields are LinkedHashMap instead of just plain HashMap to keep the keys in insertion order. I have a rough patch that has these changes just wasn't sure if it made sense to add to this or to a separate JIRA.
          Hide
          risdenk Kevin Risden added a comment -

          added my thoughts and rough patch to SOLR-8512

          Show
          risdenk Kevin Risden added a comment - added my thoughts and rough patch to SOLR-8512
          Hide
          dpgove Dennis Gove added a comment -

          I think this is one of those situations where what one almost always sees in a JDBC implementation can actually be reversed. Normally in a JDBC implementation when someone asks for a value via columnName that name will be translated to a columnIdx and that index will be used to lookup the actual value. But because values are keyed by columnName in the tuple I think it could just be reversed where when asking for a value by columnIndex the index could be translated to the columnName and that name used to lookup the actual value.

          This removes the need to depend on order of the fields in the tuple.

          Show
          dpgove Dennis Gove added a comment - I think this is one of those situations where what one almost always sees in a JDBC implementation can actually be reversed. Normally in a JDBC implementation when someone asks for a value via columnName that name will be translated to a columnIdx and that index will be used to lookup the actual value. But because values are keyed by columnName in the tuple I think it could just be reversed where when asking for a value by columnIndex the index could be translated to the columnName and that name used to lookup the actual value. This removes the need to depend on order of the fields in the tuple.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Yep, this makes perfect sense.

          Show
          joel.bernstein Joel Bernstein added a comment - Yep, this makes perfect sense.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Just mulling over the linkedHashMap, not sure how well that will serialize in JSON.

          Show
          joel.bernstein Joel Bernstein added a comment - Just mulling over the linkedHashMap, not sure how well that will serialize in JSON.
          Hide
          risdenk Kevin Risden added a comment -

          But because values are keyed by columnName in the tuple I think it could just be reversed where when asking for a value by columnIndex the index could be translated to the columnName and that name used to lookup the actual value.

          I agree that looking up by index could be translated into column name, but that requires that the mapping of name to index or vice versa to be available at the driver. Currently, there is no metadata that keeps track of positional index to column name that gets passed back from the SQLHandler. I had a working solution that added some metadata to the stream (and had to change the JSONTupleStream parsing) and that seemed awkward/wrong. Maybe there is a better way to pass the metadata back from the SQL handler?

          The least intrusive to me seems to me to be that the Tuple object be able to keep its fields in order and not only by name. I don't know how to enforce this though other than maybe using the LinkedHashMap and some tests to ensure that a new stream doesn't cause issues.

          Show
          risdenk Kevin Risden added a comment - But because values are keyed by columnName in the tuple I think it could just be reversed where when asking for a value by columnIndex the index could be translated to the columnName and that name used to lookup the actual value. I agree that looking up by index could be translated into column name, but that requires that the mapping of name to index or vice versa to be available at the driver. Currently, there is no metadata that keeps track of positional index to column name that gets passed back from the SQLHandler. I had a working solution that added some metadata to the stream (and had to change the JSONTupleStream parsing) and that seemed awkward/wrong. Maybe there is a better way to pass the metadata back from the SQL handler? The least intrusive to me seems to me to be that the Tuple object be able to keep its fields in order and not only by name. I don't know how to enforce this though other than maybe using the LinkedHashMap and some tests to ensure that a new stream doesn't cause issues.
          Hide
          risdenk Kevin Risden added a comment -

          Just mulling over the linkedHashMap, not sure how well that will serialize in JSON.

          It seems to serialize just fine from the tests I was running with the patch on SOLR-8512. It serializes just like a regular Map and when it gets deserialized by the JSONTupleStream it comes out as a LinkedHashMap and gets cast as a regular Map. This would be the expected behavior from what I can tell.

          Show
          risdenk Kevin Risden added a comment - Just mulling over the linkedHashMap, not sure how well that will serialize in JSON. It seems to serialize just fine from the tests I was running with the patch on SOLR-8512 . It serializes just like a regular Map and when it gets deserialized by the JSONTupleStream it comes out as a LinkedHashMap and gets cast as a regular Map. This would be the expected behavior from what I can tell.

            People

            • Assignee:
              joel.bernstein Joel Bernstein
              Reporter:
              susheel2777@gmail.com Susheel Kumar
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development