Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-911

Add a variant of CalciteSchema that does not cache sub-objects

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5.0
    • Component/s: None
    • Labels:
      None

      Description

      CalciteSchema by default uses cache to store table, sub-schema, function. This would work perfectly for schema-based system, yet would create problem for Drill, which dynamically explore the schema on the fly during query execution.

      One solution is to refactor CalciteSchema and make it as an interface. The default implementation would still use the current implementation. Further, it would other system to extend the default behavior and make CalciteSchema works for Drill as well.

      Background information: The issue around CalciteSchema is one of the reasons that Drill has to use a forked version of Calcite. Hopefully, if we could resolve this issue, we are one step further to remove the forked Calcite in the near future.

        Activity

        Hide
        jnadeau Jacques Nadeau added a comment -

        It makes sense to get this up with tests against it. If some consumers of Calcite want to manage caching on their own, we shouldn't force them to have to fork Calcite.

        Show
        jnadeau Jacques Nadeau added a comment - It makes sense to get this up with tests against it. If some consumers of Calcite want to manage caching on their own, we shouldn't force them to have to fork Calcite.
        Hide
        jni Jinfeng Ni added a comment - - edited

        I submit PR at: https://github.com/apache/incubator-calcite/pull/142

        It contains two commits. The first one is the original one made by Jacques in Drill's forked Calcite. I did some refactoring work in the second commit, such that 1) CalciteSchema becomes an interface, 2) CalciteAbstractSchema contains the implementation for either caching version or non-caching version, 3) The origianl caching version extends from CalciteAbstractSchema and is put into CalciteSchemaImpl.

        I removed SimpleCalciteSchema in Jacques's commit. I intended to put it in Drill's code by extending from CalciteAbstractSchema, as I think different system should be able to provide different implementation of CalciteSchema interface itself, if it wants a different behavior than the default CalciteSchemaImpl one.

        If the above logic makes sense, then it seems reasonable for each system to add test for different implementation and Calcite should only have test for the default one : CalciteSchemaImpl.

        Jacques Nadeau and Julian Hyde, could you please take a look at the PR?

        Show
        jni Jinfeng Ni added a comment - - edited I submit PR at: https://github.com/apache/incubator-calcite/pull/142 It contains two commits. The first one is the original one made by Jacques in Drill's forked Calcite. I did some refactoring work in the second commit, such that 1) CalciteSchema becomes an interface, 2) CalciteAbstractSchema contains the implementation for either caching version or non-caching version, 3) The origianl caching version extends from CalciteAbstractSchema and is put into CalciteSchemaImpl. I removed SimpleCalciteSchema in Jacques's commit. I intended to put it in Drill's code by extending from CalciteAbstractSchema, as I think different system should be able to provide different implementation of CalciteSchema interface itself, if it wants a different behavior than the default CalciteSchemaImpl one. If the above logic makes sense, then it seems reasonable for each system to add test for different implementation and Calcite should only have test for the default one : CalciteSchemaImpl. Jacques Nadeau and Julian Hyde , could you please take a look at the PR?
        Hide
        julianhyde Julian Hyde added a comment -

        My concern about this patch is the huge increase in API surface area. There are a lot of classes and interfaces that include the word "schema" and it's frankly a rat's nest. You didn't create the rat's nest: I created Schema and SchemaFactory and SchemaPlus and SchemaPlusImpl and CalciteSchema. But the key part of that design were the words "Used internally" against CalciteSchema. And you want to make that implementation public.

        So, I think this patch is taking us in the wrong direction.

        Can we start off with your requirements? I believe your requirement is to have a representation of schema that knows its structure (name, parent schema, child schemas, child objects), you build yourself, and does not need to be re-created for every request.

        Show
        julianhyde Julian Hyde added a comment - My concern about this patch is the huge increase in API surface area. There are a lot of classes and interfaces that include the word "schema" and it's frankly a rat's nest. You didn't create the rat's nest: I created Schema and SchemaFactory and SchemaPlus and SchemaPlusImpl and CalciteSchema. But the key part of that design were the words "Used internally" against CalciteSchema. And you want to make that implementation public. So, I think this patch is taking us in the wrong direction. Can we start off with your requirements? I believe your requirement is to have a representation of schema that knows its structure (name, parent schema, child schemas, child objects), you build yourself, and does not need to be re-created for every request.
        Hide
        jnadeau Jacques Nadeau added a comment -

        Jinfeng Ni, I wonder if you can post an updated patch that includes the SimpleCalciteSchema. The goal isn't really to create a larger api surface area.

        The main issue we're hitting against is that CalciteSchema was changed around 1.0 so that it cached a huge amount of additional state internally. This was a substantial breaking change for Drill. Without major refactoring in Drill, we couldn't have the schema caching a bunch of state. As such, we needed to basically maintain the old implementation of Schema which was minimally stateful. This patch achieves that goal. To get off the fork, we need to either have two implementations, expose an api so we can have our own implementation or remove the caching. Any of the options seems fine. I probably the last since this can then be left to framework implementers (and keeps code as simple as possible).

        I think it is very important for interoperability (e.g. the Phoenix integration) so let's figure out the best we to solve this.

        Show
        jnadeau Jacques Nadeau added a comment - Jinfeng Ni , I wonder if you can post an updated patch that includes the SimpleCalciteSchema. The goal isn't really to create a larger api surface area. The main issue we're hitting against is that CalciteSchema was changed around 1.0 so that it cached a huge amount of additional state internally. This was a substantial breaking change for Drill. Without major refactoring in Drill, we couldn't have the schema caching a bunch of state. As such, we needed to basically maintain the old implementation of Schema which was minimally stateful. This patch achieves that goal. To get off the fork, we need to either have two implementations, expose an api so we can have our own implementation or remove the caching. Any of the options seems fine. I probably the last since this can then be left to framework implementers (and keeps code as simple as possible). I think it is very important for interoperability (e.g. the Phoenix integration) so let's figure out the best we to solve this.
        Hide
        jni Jinfeng Ni added a comment - - edited

        I'll post a patch which includes the SimpleCalciteSchema.

        As Jacques Nadeau said, the caching code in CalciteSchema breaks almost every query in Drill. If we could have a different implementation without the caching, then it probably will solve our problem.

        That's why I move code around interface / abstract class, to make CalciteSchema to have different implementation. I'm not quite follow about Julian Hyde api concern : the method defined in the new interface are mostly public methods in CalciteSchema, which is public class. Why is such change regarded as increase api change ( public methods in public class vs. methods in interface). I assume user of Calcite would see those methods in either cases.

        Show
        jni Jinfeng Ni added a comment - - edited I'll post a patch which includes the SimpleCalciteSchema. As Jacques Nadeau said, the caching code in CalciteSchema breaks almost every query in Drill. If we could have a different implementation without the caching, then it probably will solve our problem. That's why I move code around interface / abstract class, to make CalciteSchema to have different implementation. I'm not quite follow about Julian Hyde api concern : the method defined in the new interface are mostly public methods in CalciteSchema, which is public class. Why is such change regarded as increase api change ( public methods in public class vs. methods in interface). I assume user of Calcite would see those methods in either cases.
        Hide
        jni Jinfeng Ni added a comment -

        I added back the SimpleCalciteSchema in the pull request. This is the implementation of CalciteSchema for Drill only; Drill currently does not use features like lattice, therefore some functions imply throw unsupported exception or return an empty list.

        Show
        jni Jinfeng Ni added a comment - I added back the SimpleCalciteSchema in the pull request. This is the implementation of CalciteSchema for Drill only; Drill currently does not use features like lattice, therefore some functions imply throw unsupported exception or return an empty list.
        Hide
        jnadeau Jacques Nadeau added a comment -

        Let's get this resolved for 1.5.

        Julian Hyde, do you have preference for which approach from above to take? It seems like removing the caching behavior from the existing calcite schema class is the least intrusive/impactful.

        Show
        jnadeau Jacques Nadeau added a comment - Let's get this resolved for 1.5. Julian Hyde , do you have preference for which approach from above to take? It seems like removing the caching behavior from the existing calcite schema class is the least intrusive/impactful.
        Hide
        julianhyde Julian Hyde added a comment -

        Jinfeng Ni, Read the comment at the top of CalciteSchema:

        Wrapper around user-defined schema used internally.

        In Java many things need to be public so they can be used by other packages. That does NOT make them a public API!!

        Show
        julianhyde Julian Hyde added a comment - Jinfeng Ni , Read the comment at the top of CalciteSchema: Wrapper around user-defined schema used internally. In Java many things need to be public so they can be used by other packages. That does NOT make them a public API!!
        Hide
        jni Jinfeng Ni added a comment - - edited

        Whether we put it in a public API is not the really purpose of this patch (I guess that's a different topic). All we want is make sure Drill can continue to use Calcite, so that we do not have to keep our forked version. As Jacques Nadeau said, Calcite's code caused breaking change for Drill around 0.9 release.

        We proposed two solutions to solve this :1) either Calcite provide two implementations of CalciteSchema, 2) Calcite allows people to implement according to the public API (which I guess you do not want to put in public API).

        What's your recommendation? Personally, I do not care which solution we use, as long as Calcite does not break Drill.

        Show
        jni Jinfeng Ni added a comment - - edited Whether we put it in a public API is not the really purpose of this patch (I guess that's a different topic). All we want is make sure Drill can continue to use Calcite, so that we do not have to keep our forked version. As Jacques Nadeau said, Calcite's code caused breaking change for Drill around 0.9 release. We proposed two solutions to solve this :1) either Calcite provide two implementations of CalciteSchema, 2) Calcite allows people to implement according to the public API (which I guess you do not want to put in public API). What's your recommendation? Personally, I do not care which solution we use, as long as Calcite does not break Drill.
        Hide
        jni Jinfeng Ni added a comment -

        Let me try another approach in Drill side to see if we can go without using CalciteSchema. Will update the result shortly.

        Show
        jni Jinfeng Ni added a comment - Let me try another approach in Drill side to see if we can go without using CalciteSchema. Will update the result shortly.
        Hide
        julianhyde Julian Hyde added a comment -

        I think the approach of having variants of CalciteSchema that do and do not cache will work. Especially if other projects are not allowed to sub-class them.

        I'm still reviewing, and trying to figure out the relationship among the various commits. Would we need all of the commits? My gut tells me this should be a fairly small change. And by the way we should obsolete CalciteRootSchema and just 'assert schema.isRoot()' in the places that formerly took a CalciteRootSchema.

        I don't yet see a good reason for changing the return type of getTable and other methods from TableEntry to Pair<String, Table>.

        Show
        julianhyde Julian Hyde added a comment - I think the approach of having variants of CalciteSchema that do and do not cache will work. Especially if other projects are not allowed to sub-class them. I'm still reviewing, and trying to figure out the relationship among the various commits. Would we need all of the commits? My gut tells me this should be a fairly small change. And by the way we should obsolete CalciteRootSchema and just 'assert schema.isRoot()' in the places that formerly took a CalciteRootSchema. I don't yet see a good reason for changing the return type of getTable and other methods from TableEntry to Pair<String, Table>.
        Hide
        jni Jinfeng Ni added a comment -

        As I said yesterday, I tried a different approach. Turns out it did not work.

        Basically, I want to avoid using CalciteSchema in Drill (Suppose CalciteSchema is not part of public API, and should not be used in Adapter like Drill??). The reason Drill originally has to use CalciteSchema is to get an instance of SchemaPlus, and pass that SchemaPlus to FrameworkConfig as the defaultSchema. With this idea, I implemented DrillSchemaPlus, an implementation of interface SchemaPlus, and maintain the sub-schema / tables etc in DrillSchemaPlus.

        However, I hit ClassCastException, as couple of places in Calcite core code expects instance of CalciteSchema$SchemaPlusImpl, in stead of any implementation of SchemaPlus. [1] [2]

         public static CalciteSchema from(SchemaPlus plus) {
            return ((SchemaPlusImpl) plus).calciteSchema();
          }
        

        I'm not sure whether SchemaPlus is part of public API, or each different project could use different implementation of SchemaPlus. But seems the other part of Calcite closes such possibility, as people has to use CalciteSchema to get an instance of SchemaPlus (Does it make CalciteSchema not only used in Calcite internally, but also in other projects, thus violates the comments made in CalciteSchema??)

        Anyway, for now, I do not see a way to avoid using CalciteSchema in order to use Calcite's Frameworks code.

        [1] https://github.com/apache/incubator-calcite/blob/master/core/src/main/java/org/apache/calcite/tools/Frameworks.java#L111
        [2] https://github.com/apache/incubator-calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/CalcitePrepareImpl.java#L923

        Show
        jni Jinfeng Ni added a comment - As I said yesterday, I tried a different approach. Turns out it did not work. Basically, I want to avoid using CalciteSchema in Drill (Suppose CalciteSchema is not part of public API, and should not be used in Adapter like Drill??). The reason Drill originally has to use CalciteSchema is to get an instance of SchemaPlus, and pass that SchemaPlus to FrameworkConfig as the defaultSchema. With this idea, I implemented DrillSchemaPlus, an implementation of interface SchemaPlus, and maintain the sub-schema / tables etc in DrillSchemaPlus. However, I hit ClassCastException, as couple of places in Calcite core code expects instance of CalciteSchema$SchemaPlusImpl, in stead of any implementation of SchemaPlus. [1] [2] public static CalciteSchema from(SchemaPlus plus) { return ((SchemaPlusImpl) plus).calciteSchema(); } I'm not sure whether SchemaPlus is part of public API, or each different project could use different implementation of SchemaPlus. But seems the other part of Calcite closes such possibility, as people has to use CalciteSchema to get an instance of SchemaPlus (Does it make CalciteSchema not only used in Calcite internally, but also in other projects, thus violates the comments made in CalciteSchema??) Anyway, for now, I do not see a way to avoid using CalciteSchema in order to use Calcite's Frameworks code. [1] https://github.com/apache/incubator-calcite/blob/master/core/src/main/java/org/apache/calcite/tools/Frameworks.java#L111 [2] https://github.com/apache/incubator-calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/CalcitePrepareImpl.java#L923
        Hide
        jni Jinfeng Ni added a comment -

        The first commit, Jacques Nadeau changed CalciteSchema to abstract class, and provided two version of extensions: caching and noncaching. The second commit is change CalciteSchema to interface, and move code around abstract class and Impl. The 3rd commit is adding a non-caching implementation, based on the interface defined in the second commit.

        The comment about return type of getTable(). I guess you are referring to Jacque's commit, which happened before Calcite changes the return type from Pair<String, Table> to TableEntry [1]. The second commit actually change it to the TableEntry.

        [1]. https://github.com/apache/incubator-calcite/commit/8b53dc49f49113daf12a355c5db18ccc248f4924

        Show
        jni Jinfeng Ni added a comment - The first commit, Jacques Nadeau changed CalciteSchema to abstract class, and provided two version of extensions: caching and noncaching. The second commit is change CalciteSchema to interface, and move code around abstract class and Impl. The 3rd commit is adding a non-caching implementation, based on the interface defined in the second commit. The comment about return type of getTable(). I guess you are referring to Jacque's commit, which happened before Calcite changes the return type from Pair<String, Table> to TableEntry [1] . The second commit actually change it to the TableEntry. [1] . https://github.com/apache/incubator-calcite/commit/8b53dc49f49113daf12a355c5db18ccc248f4924
        Hide
        julianhyde Julian Hyde added a comment -

        Regarding SchemaPlus and SchemaPlusImpl. SchemaPlus is intended to be used by users – it appears as a parameter in several SPI methods – but not instantiated by them. Users should only use the SchemaPlus they are given by the system. In fact the purpose of SchemaPlus was to expose to user code, in a read only manner, some of the information in CalciteSchema.

        It is analogous to java.sql.Connection.rollback(Savepoint), which can only be called with a Savepoint created by the same Connection.

        Show
        julianhyde Julian Hyde added a comment - Regarding SchemaPlus and SchemaPlusImpl. SchemaPlus is intended to be used by users – it appears as a parameter in several SPI methods – but not instantiated by them. Users should only use the SchemaPlus they are given by the system. In fact the purpose of SchemaPlus was to expose to user code, in a read only manner, some of the information in CalciteSchema. It is analogous to java.sql.Connection.rollback(Savepoint), which can only be called with a Savepoint created by the same Connection.
        Hide
        julianhyde Julian Hyde added a comment -

        I am leaning towards approach number 1, where CalciteSchema is an abstract class with two concrete sub-classes. Thanks for clarifying about getTable - we should definitely keep the return type as TableEntry.

        Would this approach work for you guys?

        Show
        julianhyde Julian Hyde added a comment - I am leaning towards approach number 1, where CalciteSchema is an abstract class with two concrete sub-classes. Thanks for clarifying about getTable - we should definitely keep the return type as TableEntry. Would this approach work for you guys?
        Hide
        jni Jinfeng Ni added a comment -

        I think approach # 1 should work for Drill. Thanks!

        Show
        jni Jinfeng Ni added a comment - I think approach # 1 should work for Drill. Thanks!
        Hide
        jnadeau Jacques Nadeau added a comment -

        Jinfeng Ni, do you think there will be an update patch soon so we can get this into v1.5?

        Show
        jnadeau Jacques Nadeau added a comment - Jinfeng Ni , do you think there will be an update patch soon so we can get this into v1.5?
        Hide
        jni Jinfeng Ni added a comment -

        Julian Hyde and Jacques Nadeau, I revised the patch based on the comments.

        The new patch changed CalciteSchema to be an abstract class. The default one is CachingCalciteSchema, which uses cache for implicit subschema, tables and functions. The new one, SimpleCalciteSchema, shares the same datas structure/code for the explicit subschema, tables and functions (previous patch used a different structure / code), but does not use cache for implicit subschema, tables and functions. (I could refactor the code such that the code accesses the explicit subschema etc is moved to abstract class. But I want to have you take a look to see if the CachingCalciteSchema/SimpleCalciteSchema looks fine or not first, as it is easier to see the changes)

        The reason that Drill could not use cache is that Drill dynamically explores the list of tables. For example, for a file system schema, Drill will not list all the files/ tables under that schema initially. Only when the table is queried with, it's added into the table list for the file system schema. That's different from the normal CalciteSchema, where the table list is pre-determined.

        Also, per Julian's suggestion, I depredated CalciteRootSchema and added checking code at places where the rootSchema is used.

        Please take another look at this patch and see if there is any problem.

        Link to the PR:

        https://github.com/apache/calcite/pull/142

        BTW: I run Drill's regression with the patch, and did not see any regression.

        Show
        jni Jinfeng Ni added a comment - Julian Hyde and Jacques Nadeau , I revised the patch based on the comments. The new patch changed CalciteSchema to be an abstract class. The default one is CachingCalciteSchema, which uses cache for implicit subschema, tables and functions. The new one, SimpleCalciteSchema, shares the same datas structure/code for the explicit subschema, tables and functions (previous patch used a different structure / code), but does not use cache for implicit subschema, tables and functions. (I could refactor the code such that the code accesses the explicit subschema etc is moved to abstract class. But I want to have you take a look to see if the CachingCalciteSchema/SimpleCalciteSchema looks fine or not first, as it is easier to see the changes) The reason that Drill could not use cache is that Drill dynamically explores the list of tables. For example, for a file system schema, Drill will not list all the files/ tables under that schema initially. Only when the table is queried with, it's added into the table list for the file system schema. That's different from the normal CalciteSchema, where the table list is pre-determined. Also, per Julian's suggestion, I depredated CalciteRootSchema and added checking code at places where the rootSchema is used. Please take another look at this patch and see if there is any problem. Link to the PR: https://github.com/apache/calcite/pull/142 BTW: I run Drill's regression with the patch, and did not see any regression.
        Hide
        julianhyde Julian Hyde added a comment -

        SimpleCalciteSchema and CachingCalciteSchema are the only two implementations of CalciteSchema we need, right? It's not in Calcite's interests to support this as a public API or SPI. Even most Calcite code can use the base class. So let's lock this down. Make SimpleCalciteSchema and CachingCalciteSchema and their constructors package-protected. You'll have to move the create methods to CalciteSchema, maybe with a boolean caching parameter.

        1. Need to modify howto.md (references CalciteRootSchema)
        2. In CalciteRootSchema say when will be removed
        3. Add a test for SimpleCalciteSchema. It is currently not instantiated anywhere in Calcite.
        4. Would there ever be a hybrid tree, say with a simple root and caching sub-schema? If so the CalciteSchema.add method could have an optional boolean caching parameter.
        5. A few methods have javadoc even though they implement/override a method with the same javadoc - please remove
        6. There's a lot of common code between SimpleCalciteSchema and CachingCalciteSchema. Could you eliminate it all? Say create an instance of CachingCalciteSchema where implicitSubSchemaCache, implicitTableCache and implicitFunctionCache always return an empty cache.
        Show
        julianhyde Julian Hyde added a comment - SimpleCalciteSchema and CachingCalciteSchema are the only two implementations of CalciteSchema we need, right? It's not in Calcite's interests to support this as a public API or SPI. Even most Calcite code can use the base class. So let's lock this down. Make SimpleCalciteSchema and CachingCalciteSchema and their constructors package-protected. You'll have to move the create methods to CalciteSchema, maybe with a boolean caching parameter. Need to modify howto.md (references CalciteRootSchema) In CalciteRootSchema say when will be removed Add a test for SimpleCalciteSchema. It is currently not instantiated anywhere in Calcite. Would there ever be a hybrid tree, say with a simple root and caching sub-schema? If so the CalciteSchema.add method could have an optional boolean caching parameter. A few methods have javadoc even though they implement/override a method with the same javadoc - please remove There's a lot of common code between SimpleCalciteSchema and CachingCalciteSchema. Could you eliminate it all? Say create an instance of CachingCalciteSchema where implicitSubSchemaCache, implicitTableCache and implicitFunctionCache always return an empty cache.
        Hide
        julianhyde Julian Hyde added a comment -

        Jinfeng Ni, Can you get something done today? Say, responses to the review comments and a rough cut of the code. I doesn't matter if the code isn't fully working. I can take it from there, and we can start a release tomorrow.

        Show
        julianhyde Julian Hyde added a comment - Jinfeng Ni , Can you get something done today? Say, responses to the review comments and a rough cut of the code. I doesn't matter if the code isn't fully working. I can take it from there, and we can start a release tomorrow.
        Hide
        jni Jinfeng Ni added a comment -

        I'm working on it right now. I'll post a patch this afternoon or tonight.

        Show
        jni Jinfeng Ni added a comment - I'm working on it right now. I'll post a patch this afternoon or tonight.
        Hide
        jni Jinfeng Ni added a comment -

        Julian Hyde, I uploaded the revised patch based on your comments.

        1. Make SimpleCalciteSchema / CachingCalciteSchema package-protected. Move CreateRootSchema() to CalciteSchema. Add a boolean flag to indicate which version is to be created. By default, it will create the original one.
        2. Modify howto.md for CalciteRootSchema. Also specify when CalciteRootSchema is to be removed
        3. Eliminate the common code in SimpleCalciteSchema and CachingCalciteSchema. The common code is moved to abstract class. Declare couple of abstract methods to handle the implicit table/subschema/functions. Implement those functions in two sub-classes.
        4. In stead of returning empty implicitSubSchemaCache, implicitTableCache and implicitFunctionCache for SimpleCalciteSchema, the implemented functions in SimpleCalciteSchema do not reply on the caches; it get things required directly from 'schema'.
        5. The original CalciteSchema code seems to have a bug in getSubSchemaMap() and getTablesBasedOnNullaryFunctions(), there are naming conflicts between explicit / implicit subschemas. Unlike ImmutableSortedSet.Builder, ImmutableSortedMap.Builder would not handle the duplicate keys and just throw Exception.
        (/**

        • Associates {@code key}

          with

          {@code value}

          in the built map. Duplicate

        • keys, according to the comparator (which might be the keys' natural
        • order), are not allowed, and will cause {@link #build}

          to fail.
          */
          @Override public Builder<K, V> put(K key, V value)

          { entries.add(entryOf(key, value)); return this; }

          )

        I modify the code to handle the duplicate names, and add a unit test case for naming conflicts.
        6. For now, I feel the case of hybrid tree of schemas might be not needed. The normal Calcite code would use the caching version, while Drill would use the non-caching version. Therefore, I did not add boolean flag to CalciteSchema.add() method. If we late find it's necessary, we can add that.
        7. Add a unit test for SimpleCalciteSchema.

        I run some Drill testcase with the new patch. Seems to be fine, after resolving one issue in Drill's JdbcSchema. I'm going to run the whole test suite, and update the results shortly.

        Please let me know if you see any problem in the modified patch. Thanks!

        Show
        jni Jinfeng Ni added a comment - Julian Hyde , I uploaded the revised patch based on your comments. 1. Make SimpleCalciteSchema / CachingCalciteSchema package-protected. Move CreateRootSchema() to CalciteSchema. Add a boolean flag to indicate which version is to be created. By default, it will create the original one. 2. Modify howto.md for CalciteRootSchema. Also specify when CalciteRootSchema is to be removed 3. Eliminate the common code in SimpleCalciteSchema and CachingCalciteSchema. The common code is moved to abstract class. Declare couple of abstract methods to handle the implicit table/subschema/functions. Implement those functions in two sub-classes. 4. In stead of returning empty implicitSubSchemaCache, implicitTableCache and implicitFunctionCache for SimpleCalciteSchema, the implemented functions in SimpleCalciteSchema do not reply on the caches; it get things required directly from 'schema'. 5. The original CalciteSchema code seems to have a bug in getSubSchemaMap() and getTablesBasedOnNullaryFunctions(), there are naming conflicts between explicit / implicit subschemas. Unlike ImmutableSortedSet.Builder, ImmutableSortedMap.Builder would not handle the duplicate keys and just throw Exception. (/** Associates {@code key} with {@code value} in the built map. Duplicate keys, according to the comparator (which might be the keys' natural order), are not allowed, and will cause {@link #build} to fail. */ @Override public Builder<K, V> put(K key, V value) { entries.add(entryOf(key, value)); return this; } ) I modify the code to handle the duplicate names, and add a unit test case for naming conflicts. 6. For now, I feel the case of hybrid tree of schemas might be not needed. The normal Calcite code would use the caching version, while Drill would use the non-caching version. Therefore, I did not add boolean flag to CalciteSchema.add() method. If we late find it's necessary, we can add that. 7. Add a unit test for SimpleCalciteSchema. I run some Drill testcase with the new patch. Seems to be fine, after resolving one issue in Drill's JdbcSchema. I'm going to run the whole test suite, and update the results shortly. Please let me know if you see any problem in the modified patch. Thanks!
        Hide
        julianhyde Julian Hyde added a comment -

        Thanks; reviewing and testing now.

        Show
        julianhyde Julian Hyde added a comment - Thanks; reviewing and testing now.
        Hide
        julianhyde Julian Hyde added a comment -
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/ac8d04ed . Thanks for the PR, Jinfeng Ni !
        Hide
        jni Jinfeng Ni added a comment -

        Julian Hyde, thanks for reviewing and merging to Calcite!

        I'm really glad to see we finally got rid of one of the factors for Drill to use a forked version. We will continue to work on the rest two as listed in DRILL-3993, so that Drill does not use forked version.

        Show
        jni Jinfeng Ni added a comment - Julian Hyde , thanks for reviewing and merging to Calcite! I'm really glad to see we finally got rid of one of the factors for Drill to use a forked version. We will continue to work on the rest two as listed in DRILL-3993 , so that Drill does not use forked version.
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        Resolved in release 1.5.0 (2015-11-10)

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - Resolved in release 1.5.0 (2015-11-10)

          People

          • Assignee:
            jni Jinfeng Ni
            Reporter:
            jni Jinfeng Ni
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development