Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-198

Use RexExecutor to evaluate projections and filters

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Hi,

      I was asking on the mailing list if there is a way to use the existing code-generation infrastructure to evaluate projections and filters: https://groups.google.com/forum/#!topic/optiq-dev/-97ONPmDTB4.

      This pull request contains my first approach to change the Rex* components to allow that.

      My change includes the following:

      • Decoupling of code generation, compilation and execution (to allow efficient execution). I therefore introduced the `RexExecutable` class that allows to `reduce()` and `execute()` (given a DataContext)
      • I also changed the `DataContext` interface to allow direct access (using an int-index) to the data (instead of using a HashMap)
      • The most important change is that I introduced a `isExternal`-flag on the `RexInputRef` which indicates that the Ref refers to a variable accessible from the DataContext.

      I'm very happy about any feedback regarding my pull request. I know its certainly not the cleanest approach to implement this feature. But I didn't want to spend too much time working on it without any feedback. (Maybe adding something like a `RexContextRef` is cleaner? (I probably have to shuttle through replacing all InputRefs by RexContextRefs then))

      I know that the pull request is a bit messed up (4 commits and I accidentally changed some code formattings while working with (/fighting against) the checkstyle plugin. Once I got feedback, I'll clean those things up and update the PR (using force push).

      ---------------- Imported from GitHub ----------------
      Url: https://github.com/julianhyde/optiq/pull/198
      Created by: rmetzger
      Labels:
      Created at: Sun Mar 23 16:13:47 CET 2014
      State: closed

        Activity

        Hide
        github-import GitHub Import added a comment -

        [Date: Thu Apr 03 20:58:57 CEST 2014, Author: rmetzger]

        Thanks for merging my pull request. Your cleanup is fine. And yeah, the incubation is very exciting.

        Show
        github-import GitHub Import added a comment - [Date: Thu Apr 03 20:58:57 CEST 2014, Author: rmetzger ] Thanks for merging my pull request. Your cleanup is fine. And yeah, the incubation is very exciting.
        Hide
        github-import GitHub Import added a comment -

        [Date: Tue Apr 01 22:30:54 CEST 2014, Author: julianhyde]

        By the way, best of luck with the Stratosphere incubation!

        Show
        github-import GitHub Import added a comment - [Date: Tue Apr 01 22:30:54 CEST 2014, Author: julianhyde ] By the way, best of luck with the Stratosphere incubation!
        Hide
        github-import GitHub Import added a comment -

        [Date: Tue Apr 01 22:30:26 CEST 2014, Author: julianhyde]

        Looks good – I have committed to master. Glad to see that using `InputGetter` did the trick.

        Let me know whether you approve of my "tidying up" in the following commit. In particular removing RexExecutor.rexBuilder.

        Show
        github-import GitHub Import added a comment - [Date: Tue Apr 01 22:30:26 CEST 2014, Author: julianhyde ] Looks good – I have committed to master. Glad to see that using `InputGetter` did the trick. Let me know whether you approve of my "tidying up" in the following commit. In particular removing RexExecutor.rexBuilder.
        Hide
        github-import GitHub Import added a comment -

        [Date: Tue Apr 01 21:29:51 CEST 2014, Author: rmetzger]

        Hi Julian, thank you very much for your guidance!
        I updated the pull request to reflect your suggestions.

        I would prefer to do the optimizations you've suggested at a later point since I'm still in a "prototyping" phase for my implementation and I want to focus on other parts of the implementation.

        Show
        github-import GitHub Import added a comment - [Date: Tue Apr 01 21:29:51 CEST 2014, Author: rmetzger ] Hi Julian, thank you very much for your guidance! I updated the pull request to reflect your suggestions. I would prefer to do the optimizations you've suggested at a later point since I'm still in a "prototyping" phase for my implementation and I want to focus on other parts of the implementation.
        Hide
        github-import GitHub Import added a comment -

        [Date: Mon Mar 24 19:51:20 CET 2014, Author: julianhyde]

        I now think you're right to use RexInputRef. If you're implementing ProjectRel and FilterRel, then you'll want to access "fields of the incoming record", and that is precisely what RexInputRef is for.

        I don't think you should add `isExternal`. I think you should call `RexToLixTranslator.translateProjects` with a custom implementation of `InputGetter`.

        Also, this isn't such a big deal, but I think we can avoid making changes to DataContext. Rather than storing all of the input fields in separately, we can store the array, and the array only needs to be accessed once. In your code:

        ```java
        RelDataType inputRowType;
        DataContext dataContext;
        Object[] slots = new Object[inputRowType.getFieldCount()];
        dataContext.set("inputRecord", slots);
        ```

        Inside generated code:

        ```java
        final DataContext root;
        Integer x = (Integer) ((Object[]) root.get("inputRecord"))[0];
        Integer y = (Integer) ((Object[]) root.get("inputRecord"))[1];
        Integer result = x + y;
        ```

        and later we can optimize by generating

        ```java
        final Object[] slots = (Object[]) root.get("inputRecord");
        ```

        Show
        github-import GitHub Import added a comment - [Date: Mon Mar 24 19:51:20 CET 2014, Author: julianhyde ] I now think you're right to use RexInputRef. If you're implementing ProjectRel and FilterRel, then you'll want to access "fields of the incoming record", and that is precisely what RexInputRef is for. I don't think you should add `isExternal`. I think you should call `RexToLixTranslator.translateProjects` with a custom implementation of `InputGetter`. Also, this isn't such a big deal, but I think we can avoid making changes to DataContext. Rather than storing all of the input fields in separately, we can store the array, and the array only needs to be accessed once. In your code: ```java RelDataType inputRowType; DataContext dataContext; Object[] slots = new Object [inputRowType.getFieldCount()] ; dataContext.set("inputRecord", slots); ``` Inside generated code: ```java final DataContext root; Integer x = (Integer) ((Object[]) root.get("inputRecord")) [0] ; Integer y = (Integer) ((Object[]) root.get("inputRecord")) [1] ; Integer result = x + y; ``` and later we can optimize by generating ```java final Object[] slots = (Object[]) root.get("inputRecord"); ```
        Hide
        github-import GitHub Import added a comment -

        [Date: Mon Mar 24 16:05:36 CET 2014, Author: rmetzger]

        Hi Julian,
        thank you very much for the feedback!
        I'm currently using the code to evaluate the conditions in `FilterRel` and the projections in `ProjectRel` (and it's working pretty good).
        I guess the other projects (Drill, Lingual) implemented their own infrastructure to evaluate these.

        I stumbled across the `RexProgam`. If I saw it correctly, its used with the `CalcRel`.
        I'll look into the `CalcRel` at a later stage. My current goal is to get some simple TPC-H queries running (and the JDBC interface).

        For the `DataContext`: I think its impossible to achieve type-safety for generated code. Generating methods that receive their runtime values via parameters is certainly doable. I'm just not sure how difficult it would be to invoke these (dynamic) methods. I can probably use reflection for that, but I guess this approach has no advantage over my current approach.
        I think thread-safety is not a problem since I create a special DataContext for this purpose (and one for each thread).
        I added the numbered slots into the DataContext to allow very fast lookups. A hashmap is okay, but I would need one lookup to store and one lookup to retrieve the data for each value in each tuple. With an integer-index, I can use an array for that (avoiding the hashing and hash-collisions).

        Please give me a few days to rework the `RexInputRef.isExternal`, I'm not working full-time on the SQL-related stuff right now.

        Show
        github-import GitHub Import added a comment - [Date: Mon Mar 24 16:05:36 CET 2014, Author: rmetzger ] Hi Julian, thank you very much for the feedback! I'm currently using the code to evaluate the conditions in `FilterRel` and the projections in `ProjectRel` (and it's working pretty good). I guess the other projects (Drill, Lingual) implemented their own infrastructure to evaluate these. I stumbled across the `RexProgam`. If I saw it correctly, its used with the `CalcRel`. I'll look into the `CalcRel` at a later stage. My current goal is to get some simple TPC-H queries running (and the JDBC interface). For the `DataContext`: I think its impossible to achieve type-safety for generated code. Generating methods that receive their runtime values via parameters is certainly doable. I'm just not sure how difficult it would be to invoke these (dynamic) methods. I can probably use reflection for that, but I guess this approach has no advantage over my current approach. I think thread-safety is not a problem since I create a special DataContext for this purpose (and one for each thread). I added the numbered slots into the DataContext to allow very fast lookups. A hashmap is okay, but I would need one lookup to store and one lookup to retrieve the data for each value in each tuple. With an integer-index, I can use an array for that (avoiding the hashing and hash-collisions). Please give me a few days to rework the `RexInputRef.isExternal`, I'm not working full-time on the SQL-related stuff right now.
        Hide
        github-import GitHub Import added a comment -

        [Date: Mon Mar 24 00:43:15 CET 2014, Author: julianhyde]

        I like the general idea of what you are doing. It might help someone write an interpreter (i.e. implementations of `FilterRel`, `ProjectRel` etc. that don't require code generation). Maybe that's your goal too.

        A further step could be to create an interpreter for a `RexProgram`. It computes several expressions at the same time, allows common sub-expressions, short-cutting if there is a filter expression. That interpreter could be made quite efficient.

        Regarding `RexInputRef.isExternal`. Have you looked at other sub-types of `RexVariable`, e.g. `RexSlot` and `RexDynamicParam`? Their type is a big clue to the implementor that it should implement them in a different way than looking in the input row.

        Is it really what you want, to store data values in `DataContext`? It's neither type-safe nor thread-safe. Would it be better if we generated a method and you could invoke it with parameters? (The code-generator could be smart enough to convert variable references into parameters.) We can look at other means for getting values in and values out of the generated code, too.

        That said, I'm OK with the idea of adding numbered slots into `DataContext`, if it's really what you want.

        Thanks for submitting an early draft for feedback. It's generally useful feature, and I think we can get this into a shape where I'd accept it.

        Show
        github-import GitHub Import added a comment - [Date: Mon Mar 24 00:43:15 CET 2014, Author: julianhyde ] I like the general idea of what you are doing. It might help someone write an interpreter (i.e. implementations of `FilterRel`, `ProjectRel` etc. that don't require code generation). Maybe that's your goal too. A further step could be to create an interpreter for a `RexProgram`. It computes several expressions at the same time, allows common sub-expressions, short-cutting if there is a filter expression. That interpreter could be made quite efficient. Regarding `RexInputRef.isExternal`. Have you looked at other sub-types of `RexVariable`, e.g. `RexSlot` and `RexDynamicParam`? Their type is a big clue to the implementor that it should implement them in a different way than looking in the input row. Is it really what you want, to store data values in `DataContext`? It's neither type-safe nor thread-safe. Would it be better if we generated a method and you could invoke it with parameters? (The code-generator could be smart enough to convert variable references into parameters.) We can look at other means for getting values in and values out of the generated code, too. That said, I'm OK with the idea of adding numbered slots into `DataContext`, if it's really what you want. Thanks for submitting an early draft for feedback. It's generally useful feature, and I think we can get this into a shape where I'd accept it.

          People

          • Assignee:
            Unassigned
            Reporter:
            github-import GitHub Import
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development