Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-5995

Add cache to the dejsonize functions (JSON_VALUE, JSON_EXISTS, JSON_QUERY)

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.35.0
    • 1.36.0
    • core

    Description

      I used the json_value function to parse json values. And I found calcite's json_value function does not cache the dejsonized objects, which could cause some performance issue in situation below as the dejsonize function being called repeatedly unnecessarily.  

       

      select 
      json_value(A, 'xxx'),
      json_value(A, 'yyy'),
      json_value(A, 'zzz'),...
      from some_table;
      
      

       

       

      As project like flink uses the json_value to codegen it's own json_value function, I think this could cause a bad performance for users. So I suggest to introduce a cache in  

       

      org.apache.calcite.runtime.JsonFunctions#dejsonize

       

      and the solution is very common in projects like hive

      https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java

       

      and of course, this feature can be turned on only some certain config is setted. And if this is acceptable, I think I can take the ticket. thx

       

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            zhoujira86 xiaogang zhou
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment