Lucene - Core
  1. Lucene - Core
  2. LUCENE-5806

Extend expression grammar to allow advanced "variables"

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We currently allow dots in "variable" names in expressions, so that we can fake out object access. We should extend this to allow array access as well (both integer and string keys). This would allow faking out full object nesting through bindings.

      1. LUCENE-5806.patch
        93 kB
        Ryan Ernst
      2. LUCENE-5806.patch
        79 kB
        Ryan Ernst

        Activity

        Hide
        Ryan Ernst added a comment -

        Patch with tests.

        Show
        Ryan Ernst added a comment - Patch with tests.
        Hide
        Uwe Schindler added a comment -

        Elasticsearch scripts?

        Show
        Uwe Schindler added a comment - Elasticsearch scripts?
        Hide
        Ryan Ernst added a comment -

        Yup!

        Show
        Ryan Ernst added a comment - Yup!
        Hide
        Uwe Schindler added a comment -

        Cool! I was waiting so long!

        Show
        Uwe Schindler added a comment - Cool! I was waiting so long!
        Hide
        Uwe Schindler added a comment -

        I did not find a test where such a "fake variable" is used in a binding. How would foo.bar.strindex['abc'] look like in a binding?

        Show
        Uwe Schindler added a comment - I did not find a test where such a "fake variable" is used in a binding. How would foo.bar.strindex ['abc'] look like in a binding?
        Hide
        Ryan Ernst added a comment -

        I only added tests from the compiler side, but I can add one with bindings actually using extended variables. It would continue to show up to bindings as a string name, which the binding would have to parse.

        I thought about using the parser to break up the "variable" into components, and passing a variable context to the bindings instead of (or alternatively to) the simple string variable names. It might be a little more efficient (since the parser has to find these pieces anyways when matching), but I opted for simplicity for now (let the bindings parse as they like). In either case, the cost is still occurred once (at the beginning of an evaluate() call) and not per document.

        Show
        Ryan Ernst added a comment - I only added tests from the compiler side, but I can add one with bindings actually using extended variables. It would continue to show up to bindings as a string name, which the binding would have to parse. I thought about using the parser to break up the "variable" into components, and passing a variable context to the bindings instead of (or alternatively to) the simple string variable names. It might be a little more efficient (since the parser has to find these pieces anyways when matching), but I opted for simplicity for now (let the bindings parse as they like). In either case, the cost is still occurred once (at the beginning of an evaluate() call) and not per document.
        Hide
        Ryan Ernst added a comment -

        Note that parsing is only necessary in the bindings if you want to use these dynamically. If you know that fields are accessed through "doc" and they are "foo" and "bar", then simply adding the following to bindings should work (with ES syntax as an example):

        "doc['foo'].value"
        "doc['bar'].value"
        
        Show
        Ryan Ernst added a comment - Note that parsing is only necessary in the bindings if you want to use these dynamically. If you know that fields are accessed through "doc" and they are "foo" and "bar", then simply adding the following to bindings should work (with ES syntax as an example): "doc['foo'].value" "doc['bar'].value"
        Hide
        Uwe Schindler added a comment -

        OK, this is what I expected. I am fine with that if the "binding" is normalized:

        doc['foo'].value
        doc ["foo"].value
        doc[ 'foo' ].value
        

        is all the same so, the binding should be somehow normalized? Like:

        doc['foo'].value
        

        Otherwise its hard to hardcode bindings like you mentioned in your last post.

        Show
        Uwe Schindler added a comment - OK, this is what I expected. I am fine with that if the "binding" is normalized: doc['foo'].value doc ["foo"].value doc[ 'foo' ].value is all the same so, the binding should be somehow normalized? Like: doc['foo'].value Otherwise its hard to hardcode bindings like you mentioned in your last post.
        Hide
        Jack Conradson added a comment - - edited

        fragment
        OBJECT
        : ID (ARRAY ARRAY*)?
        ;

        Cant the above fragment just be the below fragment? (I admit it's been a while since I've used ANTLR.)

        fragment
        OBJECT
        : ID ARRAY*
        ;

        Show
        Jack Conradson added a comment - - edited fragment OBJECT : ID (ARRAY ARRAY*)? ; Cant the above fragment just be the below fragment? (I admit it's been a while since I've used ANTLR.) fragment OBJECT : ID ARRAY* ;
        Hide
        Ryan Ernst added a comment -

        Cant the above fragment just be the below fragment? (I admit it's been a while since I've used ANTLR.)

        fragment
        OBJECT
        : ID ARRAY*
        ;

        Yes of course! I'll have that fixed in my next patch (with string normalization).

        Show
        Ryan Ernst added a comment - Cant the above fragment just be the below fragment? (I admit it's been a while since I've used ANTLR.) fragment OBJECT : ID ARRAY* ; Yes of course! I'll have that fixed in my next patch (with string normalization).
        Hide
        Ryan Ernst added a comment -

        New patch with simplified grammar (thanks Jack Conradson), and string normalization. I am not doing whitespace normalization because the grammar does not currently allow it (we can change that, but I didn't see why allowing whitespace in the middle of variables would be useful?). I also added a helper class to parse variable names into their pieces (array and member access), and added demo examples for both static and dynamic use cases.

        Show
        Ryan Ernst added a comment - New patch with simplified grammar (thanks Jack Conradson ), and string normalization. I am not doing whitespace normalization because the grammar does not currently allow it (we can change that, but I didn't see why allowing whitespace in the middle of variables would be useful?). I also added a helper class to parse variable names into their pieces (array and member access), and added demo examples for both static and dynamic use cases.
        Hide
        Jack Conradson added a comment - - edited

        If you want to normalize the tokens on-the-fly you can actually write a small java snippet right into the grammar.

        An example of this would be the following which removes both the single quote characters from a string inline. The syntax should be in one of your books for this .

        STRING
        @after

        { setText(getText().substring(1, getText().length() - 1).replace("\\\\", "\\").replace("\\'", "'")); }

        : '\'' STRINGCHAR* '\''
        ;

        The other option to consider would be not skipping over whitespace but actually inserting WS!? (where the '!' removes the WS token from the parse tree tokens) throughout the grammar for finer control. This may be beyond the scope of this JIRA, though, and would potentially prevent something like " doc[ 'foo' ].value " altogether. I'm not sure whether or not this is actually allowed in JavaScript.

        Edit: Ah, because it's a fragment this is all moot.

        Show
        Jack Conradson added a comment - - edited If you want to normalize the tokens on-the-fly you can actually write a small java snippet right into the grammar. An example of this would be the following which removes both the single quote characters from a string inline. The syntax should be in one of your books for this . STRING @after { setText(getText().substring(1, getText().length() - 1).replace("\\\\", "\\").replace("\\'", "'")); } : '\'' STRINGCHAR* '\'' ; The other option to consider would be not skipping over whitespace but actually inserting WS!? (where the '!' removes the WS token from the parse tree tokens) throughout the grammar for finer control. This may be beyond the scope of this JIRA, though, and would potentially prevent something like " doc[ 'foo' ].value " altogether. I'm not sure whether or not this is actually allowed in JavaScript. Edit: Ah, because it's a fragment this is all moot.
        Hide
        Ryan Ernst added a comment -

        Edit: Ah, because it's a fragment this is all moot.

        Yup, I did try something like this, but couldn't get it working. I think it may be possible, but what I have here works, and I don't think is too bad.

        Uwe Schindler Jack Conradson I think this is ready. Do you have any more concerns?

        Show
        Ryan Ernst added a comment - Edit: Ah, because it's a fragment this is all moot. Yup, I did try something like this, but couldn't get it working. I think it may be possible, but what I have here works, and I don't think is too bad. Uwe Schindler Jack Conradson I think this is ready. Do you have any more concerns?
        Hide
        Jack Conradson added a comment -

        Current patch is fine with me. This can always be re-evaluated in the future.

        Show
        Jack Conradson added a comment - Current patch is fine with me. This can always be re-evaluated in the future.
        Hide
        ASF subversion and git services added a comment -

        Commit 1609337 from Ryan Ernst in branch 'dev/trunk'
        [ https://svn.apache.org/r1609337 ]

        LUCENE-5806: Extend expression grammar to allow advanced "variables"

        Show
        ASF subversion and git services added a comment - Commit 1609337 from Ryan Ernst in branch 'dev/trunk' [ https://svn.apache.org/r1609337 ] LUCENE-5806 : Extend expression grammar to allow advanced "variables"
        Hide
        ASF subversion and git services added a comment -

        Commit 1609338 from Ryan Ernst in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1609338 ]

        LUCENE-5806: Extend expression grammar to allow advanced "variables"

        Show
        ASF subversion and git services added a comment - Commit 1609338 from Ryan Ernst in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1609338 ] LUCENE-5806 : Extend expression grammar to allow advanced "variables"

          People

          • Assignee:
            Ryan Ernst
            Reporter:
            Ryan Ernst
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development