Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1430

Improve SQLAnalyzer by session-based parsing-result caching

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0
    • Component/s: SQL Parser
    • Labels:
      None

      Description

      There are wide tables with many many columns. Moveover, BI tools generate very complex queries whose size is several MB. Although Tajo executes those queries very fast in a few seconds, the total time of UX is slow.
      To become a fastest Hadoop DW, we need this following feature.

      tsql -f long_2times.sql
      ...
      (0 rows, 30.641 sec, 0 B selected)
      ...
      (0 rows, 1.707 sec, 0 B selected)
      
      1. long_2times.sql
        232 kB
        Dongjoon Hyun
      2. TAJO-1430.Hyun.150407.0.patch.txt
        15 kB
        Dongjoon Hyun
      3. TAJO-1430.Hyun.150407.1.patch.txt
        15 kB
        Dongjoon Hyun
      4. TAJO-1430.Hyun.150413.0.patch.txt
        15 kB
        Dongjoon Hyun
      5. TAJO-1430.Hyun.150413.1.patch.txt
        15 kB
        Dongjoon Hyun
      6. TAJO-1430.patch
        5 kB
        Dongjoon Hyun
      7. wide_table.sql
        175 kB
        Dongjoon Hyun

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user dongjoon-hyun opened a pull request:

          https://github.com/apache/tajo/pull/442

          TAJO-1430: Implement Query Parsing Result Caching

          Please see the issue TAJO-1430 for the effect.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/dongjoon-hyun/tajo TAJO-1430

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tajo/pull/442.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #442


          commit 7401afa43d0d2d61d82f172f6a613173f1c39f65
          Author: Dongjoon Hyun <dongjoon@apache.org>
          Date: 2015-03-20T07:51:49Z

          TAJO-1430: Implement Query Parsing Result Caching


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/tajo/pull/442 TAJO-1430 : Implement Query Parsing Result Caching Please see the issue TAJO-1430 for the effect. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/tajo TAJO-1430 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/442.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #442 commit 7401afa43d0d2d61d82f172f6a613173f1c39f65 Author: Dongjoon Hyun <dongjoon@apache.org> Date: 2015-03-20T07:51:49Z TAJO-1430 : Implement Query Parsing Result Caching
          Hide
          tajoqa Tajo QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12705862/TAJO-1430.patch
          against master revision release-0.9.0-rc0-212-g154f5b9.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

          +1 checkstyle. The patch generated 0 code style errors.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in tajo-core.

          Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/627//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/627//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html
          Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/627//console

          This message is automatically generated.

          Show
          tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705862/TAJO-1430.patch against master revision release-0.9.0-rc0-212-g154f5b9. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in tajo-core. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/627//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/627//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/627//console This message is automatically generated.
          Hide
          hyunsik Hyunsik Choi added a comment - - edited

          The problem definition looks good. But, the solution is naive due to the following reasons:
          1) the same queries with different parameters should be parsed every time (low cache hit ratio).
          2) the same queries with different parameters will be all kept in the cache.
          3) there is no cache invalidation method

          The better solution may be an approach like PreparedStatement.
          http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html

          This approach allows DB system to prepare many thing before execution. Also, it separates parameters represented by '?' from SQL statements. So, we will have more opportunities to increase the cache hit ratio.

          In addition, this parsed statements should be kept for each session. It would be better in terms of security and cache management.

          Show
          hyunsik Hyunsik Choi added a comment - - edited The problem definition looks good. But, the solution is naive due to the following reasons: 1) the same queries with different parameters should be parsed every time (low cache hit ratio). 2) the same queries with different parameters will be all kept in the cache. 3) there is no cache invalidation method The better solution may be an approach like PreparedStatement. http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html This approach allows DB system to prepare many thing before execution. Also, it separates parameters represented by '?' from SQL statements. So, we will have more opportunities to increase the cache hit ratio. In addition, this parsed statements should be kept for each session. It would be better in terms of security and cache management.
          Hide
          dongjoon Dongjoon Hyun added a comment -

          Thank you for your reminding, Hyunsik Choi. I agree with you in that point. I love your kind guide, always. I suggest that we create another issue for that because, as you told, it is heavier issue that can not be included in 0.10.1.

          As you know, this issue is only dependent on only SQLAnalyzer. Actually, you can think this as a faster SQLAnalyzer using caching.

          Finally, let me allow to introduce my two main reasons for that.
          First, this issue is based on global semiconductor's use cases. There exist Statement(not PreparedStatement) use cases with 20~30MB real queries. This patch just caches queries whose parsing time is greater than 30 seconds, not for all general queries. As you see in the example, in real sites, Tajo customer experience is not good. TajoCli shows a few seconds, but the clock wall shows over 30 seconds.

          Second, current TajoPreparedStatement class just makes 'SQL String' as a result from the prepared statements. I'm sure that you want me to modify this class only. In fact, I desired to improve both Statement and PreparedStatement cases. If I patches TajoPreparedStatement, the effect is very limited and cannot fullfill my customers.

          Last, but not least, you can remove later this feature very easily.

          Show
          dongjoon Dongjoon Hyun added a comment - Thank you for your reminding, Hyunsik Choi . I agree with you in that point. I love your kind guide, always. I suggest that we create another issue for that because, as you told, it is heavier issue that can not be included in 0.10.1. As you know, this issue is only dependent on only SQLAnalyzer. Actually, you can think this as a faster SQLAnalyzer using caching. Finally, let me allow to introduce my two main reasons for that. First, this issue is based on global semiconductor's use cases. There exist Statement(not PreparedStatement) use cases with 20~30MB real queries. This patch just caches queries whose parsing time is greater than 30 seconds, not for all general queries. As you see in the example, in real sites, Tajo customer experience is not good. TajoCli shows a few seconds, but the clock wall shows over 30 seconds. Second, current TajoPreparedStatement class just makes 'SQL String' as a result from the prepared statements. I'm sure that you want me to modify this class only. In fact, I desired to improve both Statement and PreparedStatement cases. If I patches TajoPreparedStatement, the effect is very limited and cannot fullfill my customers. Last, but not least, you can remove later this feature very easily.
          Hide
          dongjoon Dongjoon Hyun added a comment -

          Hi, Hyunsik Choi.

          According to your kind guide, I created another PreparedStatement issue. I will improve Tajo PreparedStatement officially. But, please note its limitation also. TajoCli and normal Statement will not be improved there.

          For this issue, I will add the cache size limitation too. We can clear the cache regularly or handle in a LRU pattern. Please see the positive effect for Tajo. In fact, I'm working on `Hi-speed Tajo` project. We want to make Tajo 5 times faster.

          If you allow me, I will create an Umbrella issue for that and maintain those efforts under that issue.

          Show
          dongjoon Dongjoon Hyun added a comment - Hi, Hyunsik Choi . According to your kind guide, I created another PreparedStatement issue. I will improve Tajo PreparedStatement officially. But, please note its limitation also. TajoCli and normal Statement will not be improved there. For this issue, I will add the cache size limitation too. We can clear the cache regularly or handle in a LRU pattern. Please see the positive effect for Tajo. In fact, I'm working on `Hi-speed Tajo` project. We want to make Tajo 5 times faster. If you allow me, I will create an Umbrella issue for that and maintain those efforts under that issue.
          Hide
          hyunsik Hyunsik Choi added a comment - - edited

          Hi Dongjoon Hyun,

          Thank you for creating the jira issues. Actually, that work is very necessary. I also think the change scope of improved PreparedStatement is somewhat vast. So, they should be done in another jira, I think.

          Anyway, this work with some cache invalidation method looks good. I also think that Session would be a good place to cache the parsed result. it would be more safe in many aspects.

          Are umbrella issues about PreparedStatement or your project? If you meant latter, you don't need to mention your project later. It would be easy to be misunderstood. If the works are reasonable in Tajo, please just do it.

          Show
          hyunsik Hyunsik Choi added a comment - - edited Hi Dongjoon Hyun , Thank you for creating the jira issues. Actually, that work is very necessary. I also think the change scope of improved PreparedStatement is somewhat vast. So, they should be done in another jira, I think. Anyway, this work with some cache invalidation method looks good. I also think that Session would be a good place to cache the parsed result. it would be more safe in many aspects. Are umbrella issues about PreparedStatement or your project? If you meant latter, you don't need to mention your project later. It would be easy to be misunderstood. If the works are reasonable in Tajo, please just do it.
          Hide
          dongjoon Dongjoon Hyun added a comment -

          Thank you for your thoughtful guidances. I will follow!

          Show
          dongjoon Dongjoon Hyun added a comment - Thank you for your thoughtful guidances. I will follow!
          Hide
          dongjoon Dongjoon Hyun added a comment -

          Change use case because cache moves into session.

          Show
          dongjoon Dongjoon Hyun added a comment - Change use case because cache moves into session.
          Hide
          tajoqa Tajo QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12706861/long_2times.sql
          against master revision release-0.9.0-rc0-217-g3e9a2dd.

          -1 patch. The patch command could not apply the patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/649//console

          This message is automatically generated.

          Show
          tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706861/long_2times.sql against master revision release-0.9.0-rc0-217-g3e9a2dd. -1 patch. The patch command could not apply the patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/649//console This message is automatically generated.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-85409177

          The cache is maintained by approximately LRU manner with 200 size and also cached item will be expired after 1 hour since last access.
          '''
          session.setQueryCache(CacheBuilder.newBuilder()
          .maximumSize(200)
          .expireAfterAccess(1, TimeUnit.HOURS)
          .build(new CacheLoader<String, Expr>() {
          public Expr load(String sql) throws SQLSyntaxError

          { return analyzer.parse(sql); }

          })
          );
          '''

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-85409177 The cache is maintained by approximately LRU manner with 200 size and also cached item will be expired after 1 hour since last access. ''' session.setQueryCache(CacheBuilder.newBuilder() .maximumSize(200) .expireAfterAccess(1, TimeUnit.HOURS) .build(new CacheLoader<String, Expr>() { public Expr load(String sql) throws SQLSyntaxError { return analyzer.parse(sql); } }) ); '''
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-85789878

          Rebased.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-85789878 Rebased.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-89540327

          Rebased.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-89540327 Rebased.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-89710956

          Hmm. I attached the test pass result of the following command.
          ```
          mvn clean install -Pparallel-test,hcatalog-0.12.0 -DLOG_LEVEL=ERROR -Dmaven.fork.count=2 > TAJO-1430.travis.log.txt
          ```
          https://app.box.com/s/rx85b6pv26sgo400e5dxf30bepyr86nl

          Currently, Travis CI seems to be unstable and the builds fail on other reasons like the following.
          ```
          ERROR: org.apache.tajo.master.rm.TajoWorkerResourceManager (run(346)) - java.lang.InterruptedException
          2015-04-04 10:12:39,174 ERROR: org.apache.tajo.util.history.HistoryWriter (writeHistory(318)) - Error while saving query history: q_1428141094535_0543:Filesystem closed
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-89710956 Hmm. I attached the test pass result of the following command. ``` mvn clean install -Pparallel-test,hcatalog-0.12.0 -DLOG_LEVEL=ERROR -Dmaven.fork.count=2 > TAJO-1430 .travis.log.txt ``` https://app.box.com/s/rx85b6pv26sgo400e5dxf30bepyr86nl Currently, Travis CI seems to be unstable and the builds fail on other reasons like the following. ``` ERROR: org.apache.tajo.master.rm.TajoWorkerResourceManager (run(346)) - java.lang.InterruptedException 2015-04-04 10:12:39,174 ERROR: org.apache.tajo.util.history.HistoryWriter (writeHistory(318)) - Error while saving query history: q_1428141094535_0543:Filesystem closed ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r27778357

          — Diff: tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java —
          @@ -155,7 +174,7 @@ public SubmitQueryResponse executeQuery(Session session, String query, boolean i
          if (isJson) {
          planningContext = buildExpressionFromJson(query);
          — End diff –

          It would be better if the cache is supported for json queries.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r27778357 — Diff: tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java — @@ -155,7 +174,7 @@ public SubmitQueryResponse executeQuery(Session session, String query, boolean i if (isJson) { planningContext = buildExpressionFromJson(query); — End diff – It would be better if the cache is supported for json queries.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r27778444

          — Diff: tajo-core/src/main/java/org/apache/tajo/session/Session.java —
          @@ -121,6 +124,10 @@ public synchronized String getCurrentDatabase()

          { return currentDatabase; }

          + public synchronized void setQueryCache(LoadingCache<String, Expr> cache)

          { this.cache = cache; }

          — End diff –

          The cache instance is always updated instead of adding a cache item to a singleton cache instance. I don't think that this is a valid cache operation.

          In addition, please follow our coding convention.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r27778444 — Diff: tajo-core/src/main/java/org/apache/tajo/session/Session.java — @@ -121,6 +124,10 @@ public synchronized String getCurrentDatabase() { return currentDatabase; } + public synchronized void setQueryCache(LoadingCache<String, Expr> cache) { this.cache = cache; } — End diff – The cache instance is always updated instead of adding a cache item to a singleton cache instance. I don't think that this is a valid cache operation. In addition, please follow our coding convention.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r27778458

          — Diff: tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java —
          @@ -143,6 +148,20 @@ private QueryContext createQueryContext(Session session)

          { newQueryContext.putAll(CommonTestingUtil.getSessionVarsForTest()); }

          + // Set queryCache in session
          + synchronized (session) {
          — End diff –

          Please remove the redundant synchronized block.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r27778458 — Diff: tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java — @@ -143,6 +148,20 @@ private QueryContext createQueryContext(Session session) { newQueryContext.putAll(CommonTestingUtil.getSessionVarsForTest()); } + // Set queryCache in session + synchronized (session) { — End diff – Please remove the redundant synchronized block.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-89800723

          Thanks for the contribution.
          I left some comments.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-89800723 Thanks for the contribution. I left some comments.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r27780810

          — Diff: tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java —
          @@ -155,7 +174,7 @@ public SubmitQueryResponse executeQuery(Session session, String query, boolean i
          if (isJson) {
          planningContext = buildExpressionFromJson(query);
          — End diff –

          Is it slow?

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r27780810 — Diff: tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java — @@ -155,7 +174,7 @@ public SubmitQueryResponse executeQuery(Session session, String query, boolean i if (isJson) { planningContext = buildExpressionFromJson(query); — End diff – Is it slow?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r27780815

          — Diff: tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java —
          @@ -143,6 +148,20 @@ private QueryContext createQueryContext(Session session)

          { newQueryContext.putAll(CommonTestingUtil.getSessionVarsForTest()); }

          + // Set queryCache in session
          + synchronized (session) {
          — End diff –

          No problem.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r27780815 — Diff: tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java — @@ -143,6 +148,20 @@ private QueryContext createQueryContext(Session session) { newQueryContext.putAll(CommonTestingUtil.getSessionVarsForTest()); } + // Set queryCache in session + synchronized (session) { — End diff – No problem.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r27780957

          — Diff: tajo-core/src/main/java/org/apache/tajo/session/Session.java —
          @@ -121,6 +124,10 @@ public synchronized String getCurrentDatabase()

          { return currentDatabase; }

          + public synchronized void setQueryCache(LoadingCache<String, Expr> cache)

          { this.cache = cache; }

          — End diff –

          That's not true. According to @hyunsik 's advice,it's a sesstion based cache. It is set once for each sesstion.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r27780957 — Diff: tajo-core/src/main/java/org/apache/tajo/session/Session.java — @@ -121,6 +124,10 @@ public synchronized String getCurrentDatabase() { return currentDatabase; } + public synchronized void setQueryCache(LoadingCache<String, Expr> cache) { this.cache = cache; } — End diff – That's not true. According to @hyunsik 's advice,it's a sesstion based cache. It is set once for each sesstion.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-89867484

          Thank you, @jihoonson . By the way, could you see the Jira, too? There was some discussion about this.

          https://issues.apache.org/jira/browse/TAJO-1430

          It's just I want to make sure what I understand.
          By the way, sorry for the outdated title of this pull request outdate.
          I'll update now according to the Jira title.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-89867484 Thank you, @jihoonson . By the way, could you see the Jira, too? There was some discussion about this. https://issues.apache.org/jira/browse/TAJO-1430 It's just I want to make sure what I understand. By the way, sorry for the outdated title of this pull request outdate. I'll update now according to the Jira title.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r27781793

          — Diff: tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java —
          @@ -155,7 +174,7 @@ public SubmitQueryResponse executeQuery(Session session, String query, boolean i
          if (isJson) {
          planningContext = buildExpressionFromJson(query);
          — End diff –

          Honestly, I didn't estimate the exact time. But, using cache will be much faster than parsing huge json queries, I think.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r27781793 — Diff: tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java — @@ -155,7 +174,7 @@ public SubmitQueryResponse executeQuery(Session session, String query, boolean i if (isJson) { planningContext = buildExpressionFromJson(query); — End diff – Honestly, I didn't estimate the exact time. But, using cache will be much faster than parsing huge json queries, I think.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r27781831

          — Diff: tajo-core/src/main/java/org/apache/tajo/session/Session.java —
          @@ -121,6 +124,10 @@ public synchronized String getCurrentDatabase()

          { return currentDatabase; }

          + public synchronized void setQueryCache(LoadingCache<String, Expr> cache)

          { this.cache = cache; }

          — End diff –

          I've already read Jira. I concerned about whether the cache is maintained within a session. But it was my misunderstanding. I'm not familiar with LoadingCache. Sorry.

          Anyway, it would be better if this cache initialization is moved to Session.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r27781831 — Diff: tajo-core/src/main/java/org/apache/tajo/session/Session.java — @@ -121,6 +124,10 @@ public synchronized String getCurrentDatabase() { return currentDatabase; } + public synchronized void setQueryCache(LoadingCache<String, Expr> cache) { this.cache = cache; } — End diff – I've already read Jira. I concerned about whether the cache is maintained within a session. But it was my misunderstanding. I'm not familiar with LoadingCache. Sorry. Anyway, it would be better if this cache initialization is moved to Session.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r27782120

          — Diff: tajo-core/src/main/java/org/apache/tajo/session/Session.java —
          @@ -121,6 +124,10 @@ public synchronized String getCurrentDatabase()

          { return currentDatabase; }

          + public synchronized void setQueryCache(LoadingCache<String, Expr> cache)

          { this.cache = cache; }

          — End diff –

          Currently, in this patch, I try to minimize dependency change in order to keep Tajo's good modularity. So, I put only Expr dependency into Session class. (It's inevitable.) If we put the cache initialization there, we need "import org.apache.tajo.engine.parser.SQLAnalyzer" there. I am not sure that Tajo allows this kind of dependency. If you think this is okay, I prefer to move that into Session. I need your guide.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r27782120 — Diff: tajo-core/src/main/java/org/apache/tajo/session/Session.java — @@ -121,6 +124,10 @@ public synchronized String getCurrentDatabase() { return currentDatabase; } + public synchronized void setQueryCache(LoadingCache<String, Expr> cache) { this.cache = cache; } — End diff – Currently, in this patch, I try to minimize dependency change in order to keep Tajo's good modularity. So, I put only Expr dependency into Session class. (It's inevitable.) If we put the cache initialization there, we need "import org.apache.tajo.engine.parser.SQLAnalyzer" there. I am not sure that Tajo allows this kind of dependency. If you think this is okay, I prefer to move that into Session. I need your guide.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r27782248

          — Diff: tajo-core/src/main/java/org/apache/tajo/session/Session.java —
          @@ -121,6 +124,10 @@ public synchronized String getCurrentDatabase()

          { return currentDatabase; }

          + public synchronized void setQueryCache(LoadingCache<String, Expr> cache)

          { this.cache = cache; }

          — End diff –

          Ok. Let's keep it. We can refactor later.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r27782248 — Diff: tajo-core/src/main/java/org/apache/tajo/session/Session.java — @@ -121,6 +124,10 @@ public synchronized String getCurrentDatabase() { return currentDatabase; } + public synchronized void setQueryCache(LoadingCache<String, Expr> cache) { this.cache = cache; } — End diff – Ok. Let's keep it. We can refactor later.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-89954302

          @jihoonson , I changed this patch according to your advice except JSON.
          I'm still not sure if JSON parser has performance issue. I think it's a little bit beyond this issue's scope.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-89954302 @jihoonson , I changed this patch according to your advice except JSON. I'm still not sure if JSON parser has performance issue. I think it's a little bit beyond this issue's scope.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r27803484

          — Diff: tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java —
          @@ -143,6 +148,18 @@ private QueryContext createQueryContext(Session session)

          { newQueryContext.putAll(CommonTestingUtil.getSessionVarsForTest()); }

          + // Set queryCache in session
          + if (session.getQueryCache() == null) {
          + session.setQueryCache(CacheBuilder.newBuilder()
          + .maximumSize(200)
          — End diff –

          Here, the size means the number of items contained in the cache. So, the actual cache size depends on the size of cached queries. This will have a problem of potentially exhausting memory. So, it would be better to set the maximum weight rather than the maximum size.

          In addition, I think that the maximum weight and the expiration period should be configurable. ```ConfVar``` would be a good place because we don't need to maintain different configurations for each user.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r27803484 — Diff: tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java — @@ -143,6 +148,18 @@ private QueryContext createQueryContext(Session session) { newQueryContext.putAll(CommonTestingUtil.getSessionVarsForTest()); } + // Set queryCache in session + if (session.getQueryCache() == null) { + session.setQueryCache(CacheBuilder.newBuilder() + .maximumSize(200) — End diff – Here, the size means the number of items contained in the cache. So, the actual cache size depends on the size of cached queries. This will have a problem of potentially exhausting memory. So, it would be better to set the maximum weight rather than the maximum size. In addition, I think that the maximum weight and the expiration period should be configurable. ```ConfVar``` would be a good place because we don't need to maintain different configurations for each user.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-90102240

          Ok. I'll investigate the json parsing performance. If you have any references for that, please share with me.

          I left another comment.
          Thanks.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-90102240 Ok. I'll investigate the json parsing performance. If you have any references for that, please share with me. I left another comment. Thanks.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r27835860

          — Diff: tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java —
          @@ -143,6 +148,18 @@ private QueryContext createQueryContext(Session session)

          { newQueryContext.putAll(CommonTestingUtil.getSessionVarsForTest()); }

          + // Set queryCache in session
          + if (session.getQueryCache() == null) {
          + session.setQueryCache(CacheBuilder.newBuilder()
          + .maximumSize(200)
          — End diff –

          That's a good idea. No problem. By the way, the weight should be estimated by SQL query string length, right? Or, do you have some Util to measure Expr memory consumption? Up to now, I can not find that kind of code in Expr. Do you think SQL query string length is okay?

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r27835860 — Diff: tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java — @@ -143,6 +148,18 @@ private QueryContext createQueryContext(Session session) { newQueryContext.putAll(CommonTestingUtil.getSessionVarsForTest()); } + // Set queryCache in session + if (session.getQueryCache() == null) { + session.setQueryCache(CacheBuilder.newBuilder() + .maximumSize(200) — End diff – That's a good idea. No problem. By the way, the weight should be estimated by SQL query string length, right? Or, do you have some Util to measure Expr memory consumption? Up to now, I can not find that kind of code in Expr. Do you think SQL query string length is okay?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-90262071

          In fact, I don't know any use cases of JSON in the real fields. Is it popular to use JSON query?

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-90262071 In fact, I don't know any use cases of JSON in the real fields. Is it popular to use JSON query?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-90298334

          We support json queries to address some requests to integrate other olap tools. As you know, generating json is much easier than generating sql.

          On the weight, the query length looks good.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-90298334 We support json queries to address some requests to integrate other olap tools. As you know, generating json is much easier than generating sql. On the weight, the query length looks good.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-90318774

          According to your advice, I updated and finished testing the code. In these days, Travis CI is prone to fail. If you need, I will upload my tested log.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-90318774 According to your advice, I updated and finished testing the code. In these days, Travis CI is prone to fail. If you need, I will upload my tested log.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-90353965

          Travis CI fails. Here is my succeeded test log.
          ```
          mvn clean install -Pparallel-test,hcatalog-0.12.0 -DLOG_LEVEL=WARN -Dmaven.fork.count=2 > TAJO-1430.150407.log.txt
          ```
          https://app.box.com/s/s7ivihn6myvpr2sbr0t6cfd8fb55zawj

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-90353965 Travis CI fails. Here is my succeeded test log. ``` mvn clean install -Pparallel-test,hcatalog-0.12.0 -DLOG_LEVEL=WARN -Dmaven.fork.count=2 > TAJO-1430 .150407.log.txt ``` https://app.box.com/s/s7ivihn6myvpr2sbr0t6cfd8fb55zawj
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-90359698

          Thanks. Would you mind triggering Jenkins by putting your patch on Jira?
          Jenkins is better than Travis because it also includes the Findbugs test.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-90359698 Thanks. Would you mind triggering Jenkins by putting your patch on Jira? Jenkins is better than Travis because it also includes the Findbugs test.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-90369902

          Sure!

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-90369902 Sure!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-90393802

          Hi, @jihoonson . Jenkins fails to start the test.

          ```
          Compiling /home/jenkins/jenkins-slave/workspace/PreCommit-TAJO-Build/incubator-tajo
          /home/jenkins/tools/maven/latest/bin/mvn clean test -DskipTests -Phcatalog-0.12.0 > /home/jenkins/jenkins-slave/workspace/PreCommit-TAJO-Build/patchprocess/trunkJavacWarnings.txt 2>&1
          Trunk compilation is broken?
          ```

          What can I do now? Please visit the following link for detail info.

          https://builds.apache.org/job/PreCommit-TAJO-Build/719/console

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-90393802 Hi, @jihoonson . Jenkins fails to start the test. ``` Compiling /home/jenkins/jenkins-slave/workspace/PreCommit-TAJO-Build/incubator-tajo /home/jenkins/tools/maven/latest/bin/mvn clean test -DskipTests -Phcatalog-0.12.0 > /home/jenkins/jenkins-slave/workspace/PreCommit-TAJO-Build/patchprocess/trunkJavacWarnings.txt 2>&1 Trunk compilation is broken? ``` What can I do now? Please visit the following link for detail info. https://builds.apache.org/job/PreCommit-TAJO-Build/719/console
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-90395916

          Hmm. Let me check for a second. I found the following Jenkins logs.
          ```
          [ERROR] COMPILATION ERROR :
          [INFO] -------------------------------------------------------------
          [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-TAJO-Build/incubator-tajo/tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java:[160,13] setQueryCache(com.google.common.cache.LoadingCache<java.lang.String,org.apache.tajo.algebra.Expr>) in org.apache.tajo.session.Session cannot be applied to (com.google.common.cache.LoadingCache<java.lang.Object,org.apache.tajo.algebra.Expr>)
          [INFO] 1 error
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-90395916 Hmm. Let me check for a second. I found the following Jenkins logs. ``` [ERROR] COMPILATION ERROR : [INFO] ------------------------------------------------------------- [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-TAJO-Build/incubator-tajo/tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java: [160,13] setQueryCache(com.google.common.cache.LoadingCache<java.lang.String,org.apache.tajo.algebra.Expr>) in org.apache.tajo.session.Session cannot be applied to (com.google.common.cache.LoadingCache<java.lang.Object,org.apache.tajo.algebra.Expr>) [INFO] 1 error ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-90414919

          For JDK6, I split one function call line into two lines.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-90414919 For JDK6, I split one function call line into two lines.
          Hide
          tajoqa Tajo QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12723563/TAJO-1430.Hyun.150407.1.patch.txt
          against master revision release-0.9.0-rc0-244-gd160f6e.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

          +1 checkstyle. The patch generated 0 code style errors.

          -1 findbugs. The patch appears to introduce 18 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in tajo-algebra tajo-common tajo-core.

          Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/720//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/720//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/720//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-common.html
          Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/720//console

          This message is automatically generated.

          Show
          tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723563/TAJO-1430.Hyun.150407.1.patch.txt against master revision release-0.9.0-rc0-244-gd160f6e. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 18 new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in tajo-algebra tajo-common tajo-core. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/720//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/720//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/720//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-common.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/720//console This message is automatically generated.
          Hide
          dongjoon Dongjoon Hyun added a comment - - edited

          I checked findbugs but all of them(1 tajo-common findbugs and 17 tajo-core findbugs) are not related to my code.

          Show
          dongjoon Dongjoon Hyun added a comment - - edited I checked findbugs but all of them(1 tajo-common findbugs and 17 tajo-core findbugs) are not related to my code.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r28202274

          — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java —
          @@ -203,6 +203,7 @@ public static int setDateOrder(int dateOrder) {

          // Query Configuration
          QUERY_SESSION_TIMEOUT("tajo.query.session.timeout-sec", 60, Validators.min("0")),
          + QUERY_SESSION_CACHE_SIZE("tajo.query.session.cache-size", 1000000, Validators.min("1000000")),
          — End diff –

          I have a couple of comments on this line.

          • The cache size seems to be specified in bytes. This will be difficult for humans to specify the exact cache size. How about using the size unit of KB?
          • The session variable name looks too general even though this cache is only for parsed queries. It would be better if users can figure out what will be configured from its name. Also, the size unit should be included in the name like ```tajo.task.size-mb```.
          • The minimum cache size is 1 GB. Do you have any reasons?
          • In addition to the size configuration, it would be great if users can turn off/on this cache feature. This is because the cache may be useless in some workloads such as ad-hoc analysis.
          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r28202274 — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java — @@ -203,6 +203,7 @@ public static int setDateOrder(int dateOrder) { // Query Configuration QUERY_SESSION_TIMEOUT("tajo.query.session.timeout-sec", 60, Validators.min("0")), + QUERY_SESSION_CACHE_SIZE("tajo.query.session.cache-size", 1000000, Validators.min("1000000")), — End diff – I have a couple of comments on this line. The cache size seems to be specified in bytes. This will be difficult for humans to specify the exact cache size. How about using the size unit of KB? The session variable name looks too general even though this cache is only for parsed queries. It would be better if users can figure out what will be configured from its name. Also, the size unit should be included in the name like ```tajo.task.size-mb```. The minimum cache size is 1 GB. Do you have any reasons? In addition to the size configuration, it would be great if users can turn off/on this cache feature. This is because the cache may be useless in some workloads such as ad-hoc analysis.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r28202863

          — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java —
          @@ -203,6 +203,7 @@ public static int setDateOrder(int dateOrder) {

          // Query Configuration
          QUERY_SESSION_TIMEOUT("tajo.query.session.timeout-sec", 60, Validators.min("0")),
          + QUERY_SESSION_CACHE_SIZE("tajo.query.session.cache-size", 1000000, Validators.min("1000000")),
          — End diff –

          Hi, @jihoonson . Thank you for kind advice.

          • What about tajo.query.session.query-cache-size-mb, then? I think it would be more proper according to your advice.
          • The minimum cache size is 1MB in terms of SQL String length. As you see in TAJO-1430 example, 100K-length query takes over 30 seconds in SQL parsing, so the cache will hold up to 10 x 100K queries accessed in last 1 hour. In real situation, 10 or more MB is needed.
          • For the cache on/off feautre, is it okay by using the condition tajo.query.session.query-cache-size-mb=0?
          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r28202863 — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java — @@ -203,6 +203,7 @@ public static int setDateOrder(int dateOrder) { // Query Configuration QUERY_SESSION_TIMEOUT("tajo.query.session.timeout-sec", 60, Validators.min("0")), + QUERY_SESSION_CACHE_SIZE("tajo.query.session.cache-size", 1000000, Validators.min("1000000")), — End diff – Hi, @jihoonson . Thank you for kind advice. What about tajo.query.session.query-cache-size-mb , then? I think it would be more proper according to your advice. The minimum cache size is 1MB in terms of SQL String length. As you see in TAJO-1430 example, 100K-length query takes over 30 seconds in SQL parsing, so the cache will hold up to 10 x 100K queries accessed in last 1 hour. In real situation, 10 or more MB is needed. For the cache on/off feautre, is it okay by using the condition tajo.query.session.query-cache-size-mb=0 ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r28203732

          — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java —
          @@ -203,6 +203,7 @@ public static int setDateOrder(int dateOrder) {

          // Query Configuration
          QUERY_SESSION_TIMEOUT("tajo.query.session.timeout-sec", 60, Validators.min("0")),
          + QUERY_SESSION_CACHE_SIZE("tajo.query.session.cache-size", 1000000, Validators.min("1000000")),
          — End diff –

          Oh, right. I meant 1MB not 1GB. According to your comment, it takes quite long time to parse queries even though their size is a few KBs. So, I think that KB is appropriate for the unit of size.

          On minimum cache size, ```Validators.min("1000000")``` means that the cache size under 1MB is an invalid value for this configuration. I think that you didn't intend that, but if so, please tell me the reason.

          On the cache on/off, your suggestion looks good. In addition to your suggestion, it will be good if we can avoid checking the cache when the configured cache size is 0. I think that this cache will not be used in many cases because the cached data can be used only when the exactly same queries are submitted repeatedly.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r28203732 — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java — @@ -203,6 +203,7 @@ public static int setDateOrder(int dateOrder) { // Query Configuration QUERY_SESSION_TIMEOUT("tajo.query.session.timeout-sec", 60, Validators.min("0")), + QUERY_SESSION_CACHE_SIZE("tajo.query.session.cache-size", 1000000, Validators.min("1000000")), — End diff – Oh, right. I meant 1MB not 1GB. According to your comment, it takes quite long time to parse queries even though their size is a few KBs. So, I think that KB is appropriate for the unit of size. On minimum cache size, ```Validators.min("1000000")``` means that the cache size under 1MB is an invalid value for this configuration. I think that you didn't intend that, but if so, please tell me the reason. On the cache on/off, your suggestion looks good. In addition to your suggestion, it will be good if we can avoid checking the cache when the configured cache size is 0. I think that this cache will not be used in many cases because the cached data can be used only when the exactly same queries are submitted repeatedly.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r28203962

          — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java —
          @@ -203,6 +203,7 @@ public static int setDateOrder(int dateOrder) {

          // Query Configuration
          QUERY_SESSION_TIMEOUT("tajo.query.session.timeout-sec", 60, Validators.min("0")),
          + QUERY_SESSION_CACHE_SIZE("tajo.query.session.cache-size", 1000000, Validators.min("1000000")),
          — End diff –

          1. tajo.query.session.query-cache-size-kb, then? No problem.
          2. On minimum cache size, let say 1K-size cache holding 10 x 100Byte SQL queries. The effect is small. In this case, we had better off the cache. In that sense, we need to guide the effective cache size practically.
          3. If cache is off, the cache will be null and Tajo will avoid checking the cache. I agree that it's important.
          4. This query cache will be important especially for PreparedStatement of TAJO-1435. (I'm working on this too.) PreparedStatement is popular in real enterprise environments. I did add PLACE_HOLDER in SQL syntax for '?' of PreparedStatement and am trying to replace them after cloning Expr. (Anyway, this is beyond the scope of this issue).

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r28203962 — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java — @@ -203,6 +203,7 @@ public static int setDateOrder(int dateOrder) { // Query Configuration QUERY_SESSION_TIMEOUT("tajo.query.session.timeout-sec", 60, Validators.min("0")), + QUERY_SESSION_CACHE_SIZE("tajo.query.session.cache-size", 1000000, Validators.min("1000000")), — End diff – 1. tajo.query.session.query-cache-size-kb , then? No problem. 2. On minimum cache size, let say 1K-size cache holding 10 x 100Byte SQL queries. The effect is small. In this case, we had better off the cache. In that sense, we need to guide the effective cache size practically. 3. If cache is off, the cache will be null and Tajo will avoid checking the cache. I agree that it's important. 4. This query cache will be important especially for PreparedStatement of TAJO-1435 . (I'm working on this too.) PreparedStatement is popular in real enterprise environments. I did add PLACE_HOLDER in SQL syntax for '?' of PreparedStatement and am trying to replace them after cloning Expr. (Anyway, this is beyond the scope of this issue).
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r28204175

          — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java —
          @@ -203,6 +203,7 @@ public static int setDateOrder(int dateOrder) {

          // Query Configuration
          QUERY_SESSION_TIMEOUT("tajo.query.session.timeout-sec", 60, Validators.min("0")),
          + QUERY_SESSION_CACHE_SIZE("tajo.query.session.cache-size", 1000000, Validators.min("1000000")),
          — End diff –

          Thanks for your rapid reply. Here are my answers.
          1. Thanks for understanding.
          2. I mean, with ```Validators.min("1000000")```, Tajo won't accept tajo.query.session.query-cache-size-mb=0. Validators should be used to validate the configuration. So, IMO, ```Validators.min(0)``` is better to prohibit negative values.
          3. Thanks for understanding.
          4. I also think that query caching is important. I mean, the current implementation will not be popularly used because the cached data can be used only when the exactly same queries are submitted repeatedly.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r28204175 — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java — @@ -203,6 +203,7 @@ public static int setDateOrder(int dateOrder) { // Query Configuration QUERY_SESSION_TIMEOUT("tajo.query.session.timeout-sec", 60, Validators.min("0")), + QUERY_SESSION_CACHE_SIZE("tajo.query.session.cache-size", 1000000, Validators.min("1000000")), — End diff – Thanks for your rapid reply. Here are my answers. 1. Thanks for understanding. 2. I mean, with ```Validators.min("1000000")```, Tajo won't accept tajo.query.session.query-cache-size-mb=0 . Validators should be used to validate the configuration. So, IMO, ```Validators.min(0)``` is better to prohibit negative values. 3. Thanks for understanding. 4. I also think that query caching is important. I mean, the current implementation will not be popularly used because the cached data can be used only when the exactly same queries are submitted repeatedly.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r28204208

          — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java —
          @@ -203,6 +203,7 @@ public static int setDateOrder(int dateOrder) {

          // Query Configuration
          QUERY_SESSION_TIMEOUT("tajo.query.session.timeout-sec", 60, Validators.min("0")),
          + QUERY_SESSION_CACHE_SIZE("tajo.query.session.cache-size", 1000000, Validators.min("1000000")),
          — End diff –

          2. You're right. Validators.min(0) is proper now because we use 0 for cache off option, too.
          4. I see. Current implementation does.
          Now, I will update the code soon. Thank you for reviewing, always!

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r28204208 — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java — @@ -203,6 +203,7 @@ public static int setDateOrder(int dateOrder) { // Query Configuration QUERY_SESSION_TIMEOUT("tajo.query.session.timeout-sec", 60, Validators.min("0")), + QUERY_SESSION_CACHE_SIZE("tajo.query.session.cache-size", 1000000, Validators.min("1000000")), — End diff – 2. You're right. Validators.min(0) is proper now because we use 0 for cache off option, too. 4. I see. Current implementation does. Now, I will update the code soon. Thank you for reviewing, always!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/442#discussion_r28204212

          — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java —
          @@ -203,6 +203,7 @@ public static int setDateOrder(int dateOrder) {

          // Query Configuration
          QUERY_SESSION_TIMEOUT("tajo.query.session.timeout-sec", 60, Validators.min("0")),
          + QUERY_SESSION_CACHE_SIZE("tajo.query.session.cache-size", 1000000, Validators.min("1000000")),
          — End diff –

          Thanks for your understanding!

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/442#discussion_r28204212 — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java — @@ -203,6 +203,7 @@ public static int setDateOrder(int dateOrder) { // Query Configuration QUERY_SESSION_TIMEOUT("tajo.query.session.timeout-sec", 60, Validators.min("0")), + QUERY_SESSION_CACHE_SIZE("tajo.query.session.cache-size", 1000000, Validators.min("1000000")), — End diff – Thanks for your understanding!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-92083131

          I changed the code and finished testing on my laptop.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-92083131 I changed the code and finished testing on my laptop.
          Hide
          tajoqa Tajo QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12724828/TAJO-1430.Hyun.150413.0.patch.txt
          against master revision release-0.9.0-rc0-250-g6136142.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

          +1 checkstyle. The patch generated 0 code style errors.

          -1 findbugs. The patch appears to introduce 18 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in tajo-algebra tajo-common tajo-core.

          Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/727//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/727//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/727//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-common.html
          Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/727//console

          This message is automatically generated.

          Show
          tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12724828/TAJO-1430.Hyun.150413.0.patch.txt against master revision release-0.9.0-rc0-250-g6136142. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 18 new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in tajo-algebra tajo-common tajo-core. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/727//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/727//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/727//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-common.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/727//console This message is automatically generated.
          Hide
          tajoqa Tajo QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12724829/TAJO-1430.Hyun.150413.1.patch.txt
          against master revision release-0.9.0-rc0-250-g6136142.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

          +1 checkstyle. The patch generated 0 code style errors.

          -1 findbugs. The patch appears to introduce 18 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in tajo-algebra tajo-common tajo-core.

          Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/728//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/728//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/728//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html
          Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/728//console

          This message is automatically generated.

          Show
          tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12724829/TAJO-1430.Hyun.150413.1.patch.txt against master revision release-0.9.0-rc0-250-g6136142. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 18 new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in tajo-algebra tajo-common tajo-core. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/728//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/728//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/728//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/728//console This message is automatically generated.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-92150527

          Thanks. I think that this work is almost done with the latest patch.
          I'll commit tonight after some tests.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-92150527 Thanks. I think that this work is almost done with the latest patch. I'll commit tonight after some tests.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-92155866

          Thank you, @jihoonson .

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-92155866 Thank you, @jihoonson .
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/442#issuecomment-92353389

          +1 LGTM!
          Thanks for your contribution.
          I think that it would be great if there are any test cases, but it can be added at https://issues.apache.org/jira/browse/TAJO-1435.
          I'll commit shortly.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/442#issuecomment-92353389 +1 LGTM! Thanks for your contribution. I think that it would be great if there are any test cases, but it can be added at https://issues.apache.org/jira/browse/TAJO-1435 . I'll commit shortly.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/tajo/pull/442

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/442
          Hide
          jihoonson Jihoon Son added a comment -

          Committed to the master branch.

          Show
          jihoonson Jihoon Son added a comment - Committed to the master branch.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Tajo-master-CODEGEN-build #304 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/304/)
          TAJO-1430: Improve SQLAnalyzer by session-based parsing-result caching. (jihoonson: rev 7d720888fd02b44ed18738ac4da36866fac46fa7)

          • tajo-core/src/main/java/org/apache/tajo/session/Session.java
          • tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java
          • tajo-algebra/src/main/java/org/apache/tajo/algebra/Insert.java
          • tajo-algebra/src/main/java/org/apache/tajo/algebra/BetweenPredicate.java
          • CHANGES
          • tajo-algebra/src/main/java/org/apache/tajo/algebra/ValueListExpr.java
          • tajo-algebra/src/main/java/org/apache/tajo/algebra/UnaryOperator.java
          • tajo-algebra/src/main/java/org/apache/tajo/algebra/CreateTable.java
          • tajo-algebra/src/main/java/org/apache/tajo/algebra/AlterTable.java
          • tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
          • tajo-algebra/src/main/java/org/apache/tajo/algebra/Aggregation.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #304 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/304/ ) TAJO-1430 : Improve SQLAnalyzer by session-based parsing-result caching. (jihoonson: rev 7d720888fd02b44ed18738ac4da36866fac46fa7) tajo-core/src/main/java/org/apache/tajo/session/Session.java tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java tajo-algebra/src/main/java/org/apache/tajo/algebra/Insert.java tajo-algebra/src/main/java/org/apache/tajo/algebra/BetweenPredicate.java CHANGES tajo-algebra/src/main/java/org/apache/tajo/algebra/ValueListExpr.java tajo-algebra/src/main/java/org/apache/tajo/algebra/UnaryOperator.java tajo-algebra/src/main/java/org/apache/tajo/algebra/CreateTable.java tajo-algebra/src/main/java/org/apache/tajo/algebra/AlterTable.java tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java tajo-algebra/src/main/java/org/apache/tajo/algebra/Aggregation.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Tajo-master-build #666 (See https://builds.apache.org/job/Tajo-master-build/666/)
          TAJO-1430: Improve SQLAnalyzer by session-based parsing-result caching. (jihoonson: rev 7d720888fd02b44ed18738ac4da36866fac46fa7)

          • tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
          • tajo-algebra/src/main/java/org/apache/tajo/algebra/UnaryOperator.java
          • tajo-algebra/src/main/java/org/apache/tajo/algebra/AlterTable.java
          • tajo-core/src/main/java/org/apache/tajo/session/Session.java
          • tajo-algebra/src/main/java/org/apache/tajo/algebra/ValueListExpr.java
          • tajo-algebra/src/main/java/org/apache/tajo/algebra/BetweenPredicate.java
          • tajo-algebra/src/main/java/org/apache/tajo/algebra/Aggregation.java
          • tajo-algebra/src/main/java/org/apache/tajo/algebra/CreateTable.java
          • CHANGES
          • tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java
          • tajo-algebra/src/main/java/org/apache/tajo/algebra/Insert.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #666 (See https://builds.apache.org/job/Tajo-master-build/666/ ) TAJO-1430 : Improve SQLAnalyzer by session-based parsing-result caching. (jihoonson: rev 7d720888fd02b44ed18738ac4da36866fac46fa7) tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java tajo-algebra/src/main/java/org/apache/tajo/algebra/UnaryOperator.java tajo-algebra/src/main/java/org/apache/tajo/algebra/AlterTable.java tajo-core/src/main/java/org/apache/tajo/session/Session.java tajo-algebra/src/main/java/org/apache/tajo/algebra/ValueListExpr.java tajo-algebra/src/main/java/org/apache/tajo/algebra/BetweenPredicate.java tajo-algebra/src/main/java/org/apache/tajo/algebra/Aggregation.java tajo-algebra/src/main/java/org/apache/tajo/algebra/CreateTable.java CHANGES tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java tajo-algebra/src/main/java/org/apache/tajo/algebra/Insert.java
          Hide
          dongjoon Dongjoon Hyun added a comment -

          Thank you! ^^

          Show
          dongjoon Dongjoon Hyun added a comment - Thank you! ^^

            People

            • Assignee:
              dongjoon Dongjoon Hyun
              Reporter:
              dongjoon Dongjoon Hyun
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development