Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1093

DateTimeFormat.to_char() is slower than SimpleDateFormat.format()

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: Data Type
    • Labels:

      Description

      I tested DateTimeFormat class with the below code.

      TimeMeta tm = new TimeMeta();
      DateTimeUtil.toJulianTimeMeta(DateTimeUtil.javaTimeToJulianTime(System.currentTimeMillis()), tm);
      
      int iteration = 1000000;
      long startTime = System.currentTimeMillis();
      for (int i = 0; i < iteration; i++) {
        DateTimeFormat.to_char(tm, "YYYY-MM-DD HH24:MI:SS");
      }
      long endTime = System.currentTimeMillis();
      System.out.println("DateTimeFormat.to_char: " + (endTime - startTime) + " ms");
      
      ///////////////////////////////////////////////////////////
      SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
      Date date = new Date();
      startTime = System.currentTimeMillis();
      for (int i = 0; i < iteration; i++) {
        df.format(date);
      }
      endTime = System.currentTimeMillis();
      System.out.println("SimpleDateFormat.format: " + (endTime - startTime) + " ms");
      
      ///////////////////////////////////////////////////////////
      startTime = System.currentTimeMillis();
      for (int i = 0; i < iteration; i++) {
        DateTimeFormat.parseDateTime("2014-01-01 12:11:11", "YYYY-MM-DD HH24:MI:SS");
      }
      endTime = System.currentTimeMillis();
      System.out.println("DateTimeFormat.parseDateTime: " + (endTime - startTime) + " ms");
      
      ///////////////////////////////////////////////////////////
      startTime = System.currentTimeMillis();
      for (int i = 0; i < iteration; i++) {
        df.parse("2014-01-01 12:11:11");
      }
      endTime = System.currentTimeMillis();
      System.out.println("SimpleDateFormat.parse: " + (endTime - startTime) + " ms");
      

      The following is the test result. DateTimeFormat.to_char is 20 times slower than SimpleDateFormat.format.

      DateTimeFormat.to_char: 6993 ms
      SimpleDateFormat.format: 373 ms
      DateTimeFormat.parseDateTime: 798 ms
      SimpleDateFormat.parse: 1400 ms
      

        Activity

        Hide
        ykrips Jihun Kang added a comment - - edited

        I have quicked reviewed the DateTimeFormat class, and found out the main reason of this performance degradation. DateTimeFormat class uses the String.format function to represent the numeric expressions, but it is very expensive functions to run. I modified the DateTimeFormat class, and I got a significant improvement on to_char function. I will post a patch of this issue after running several tests on this class.

        DateTimeFormat.to_char: 995 ms
        SimpleDateFormat.format: 854 ms
        DateTimeFormat.parseDateTime: 846 ms
        SimpleDateFormat.parse: 1550 ms

        Show
        ykrips Jihun Kang added a comment - - edited I have quicked reviewed the DateTimeFormat class, and found out the main reason of this performance degradation. DateTimeFormat class uses the String.format function to represent the numeric expressions, but it is very expensive functions to run. I modified the DateTimeFormat class, and I got a significant improvement on to_char function. I will post a patch of this issue after running several tests on this class. DateTimeFormat.to_char: 995 ms SimpleDateFormat.format: 854 ms DateTimeFormat.parseDateTime: 846 ms SimpleDateFormat.parse: 1550 ms
        Hide
        ykrips Jihun Kang added a comment -

        After patching this DateTimeFormat class, I got following results on my tests.

        DateTimeFormat.to_char: 956 ms
        SimpleDateFormat.format: 1044 ms
        DateTimeFormat.parseDateTime: 840 ms
        SimpleDateFormat.parse: 1572 ms

        Show
        ykrips Jihun Kang added a comment - After patching this DateTimeFormat class, I got following results on my tests. DateTimeFormat.to_char: 956 ms SimpleDateFormat.format: 1044 ms DateTimeFormat.parseDateTime: 840 ms SimpleDateFormat.parse: 1572 ms
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user ykrips opened a pull request:

        https://github.com/apache/tajo/pull/177

        TAJO-1093: DateTimeFormat.to_char() is slower than SimpleDateFormat.form...

        String.format function internally uses the regular expression in its implementation. Using regular expression is expensive to use.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/ykrips/tajo TAJO-1093

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/177.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #177


        commit 127b2026b8b3cc010cce738706543f606dc69d18
        Author: Jihun Kang <ykrips@gmail.com>
        Date: 2014-10-04T06:54:11Z

        TAJO-1093: DateTimeFormat.to_char() is slower than SimpleDateFormat.format()


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user ykrips opened a pull request: https://github.com/apache/tajo/pull/177 TAJO-1093 : DateTimeFormat.to_char() is slower than SimpleDateFormat.form... String.format function internally uses the regular expression in its implementation. Using regular expression is expensive to use. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ykrips/tajo TAJO-1093 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/177.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #177 commit 127b2026b8b3cc010cce738706543f606dc69d18 Author: Jihun Kang <ykrips@gmail.com> Date: 2014-10-04T06:54:11Z TAJO-1093 : DateTimeFormat.to_char() is slower than SimpleDateFormat.format()
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/177#issuecomment-57924371

        Hi @ykrips,

        I read the patch in detail. It's very nice work. The result also significantly reduces the processing time. The following is my result of the test mentioned in the Jira.
        ```
        DateTimeFormat.to_char: 7044 ms (before)
        DateTimeFormat.to_char: 1139 ms (after)
        ```

        Here is my +1 for this patch. If there is no objection until tomorrow, I'll commit the patch to master branch.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/177#issuecomment-57924371 Hi @ykrips, I read the patch in detail. It's very nice work. The result also significantly reduces the processing time. The following is my result of the test mentioned in the Jira. ``` DateTimeFormat.to_char: 7044 ms (before) DateTimeFormat.to_char: 1139 ms (after) ``` Here is my +1 for this patch. If there is no objection until tomorrow, I'll commit the patch to master branch.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/177#issuecomment-57924612

        I verified 'mvn clean install'. All unit tests are passed.

        Aside from this issue, we need to dig into the occasional test failure problem in TravisCI. I'll fix it soon.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/177#issuecomment-57924612 I verified 'mvn clean install'. All unit tests are passed. Aside from this issue, we need to dig into the occasional test failure problem in TravisCI. I'll fix it soon.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tajo/pull/177

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/177
        Hide
        hyunsik Hyunsik Choi added a comment -

        committed to master branch. Thank you Jihun for your contribution!

        Show
        hyunsik Hyunsik Choi added a comment - committed to master branch. Thank you Jihun for your contribution!
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-build #393 (See https://builds.apache.org/job/Tajo-master-build/393/)
        TAJO-1093: DateTimeFormat.to_char() is slower than SimpleDateFormat.format(). (Jihun Kang via hyunsik) (hyunsik: rev d0f9ebc1c501721ecee5422534b1740e38105996)

        • CHANGES
        • tajo-common/src/main/java/org/apache/tajo/util/datetime/DateTimeFormat.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #393 (See https://builds.apache.org/job/Tajo-master-build/393/ ) TAJO-1093 : DateTimeFormat.to_char() is slower than SimpleDateFormat.format(). (Jihun Kang via hyunsik) (hyunsik: rev d0f9ebc1c501721ecee5422534b1740e38105996) CHANGES tajo-common/src/main/java/org/apache/tajo/util/datetime/DateTimeFormat.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-CODEGEN-build #35 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/35/)
        TAJO-1093: DateTimeFormat.to_char() is slower than SimpleDateFormat.format(). (Jihun Kang via hyunsik) (hyunsik: rev d0f9ebc1c501721ecee5422534b1740e38105996)

        • CHANGES
        • tajo-common/src/main/java/org/apache/tajo/util/datetime/DateTimeFormat.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-CODEGEN-build #35 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/35/ ) TAJO-1093 : DateTimeFormat.to_char() is slower than SimpleDateFormat.format(). (Jihun Kang via hyunsik) (hyunsik: rev d0f9ebc1c501721ecee5422534b1740e38105996) CHANGES tajo-common/src/main/java/org/apache/tajo/util/datetime/DateTimeFormat.java
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user ykrips opened a pull request:

        https://github.com/apache/tajo/pull/194

        TAJO-1093: DateTimeFormat.to_char() is slower than SimpleDateFormat.form...

        Details can be found on this line note.
        https://github.com/ykrips/tajo/commit/127b2026b8b3cc010cce738706543f606dc69d18#tajo-common-src-main-java-org-apache-tajo-util-datetime-datetimeformat-java-P30
        In addition, I found a defect on my patch. When specifying the minimal width on formatString function, it could not be applied.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/ykrips/tajo TAJO-1093

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/194.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #194


        commit 49e021d9fba48dea73f333b865500e6bfac00e08
        Author: Jihun Kang <ykrips@gmail.com>
        Date: 2014-10-08T15:25:53Z

        TAJO-1093: DateTimeFormat.to_char() is slower than SimpleDateFormat.format()


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user ykrips opened a pull request: https://github.com/apache/tajo/pull/194 TAJO-1093 : DateTimeFormat.to_char() is slower than SimpleDateFormat.form... Details can be found on this line note. https://github.com/ykrips/tajo/commit/127b2026b8b3cc010cce738706543f606dc69d18#tajo-common-src-main-java-org-apache-tajo-util-datetime-datetimeformat-java-P30 In addition, I found a defect on my patch. When specifying the minimal width on formatString function, it could not be applied. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ykrips/tajo TAJO-1093 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/194.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #194 commit 49e021d9fba48dea73f333b865500e6bfac00e08 Author: Jihun Kang <ykrips@gmail.com> Date: 2014-10-08T15:25:53Z TAJO-1093 : DateTimeFormat.to_char() is slower than SimpleDateFormat.format()
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/194#issuecomment-58389750

        +1

        It's a nice finding and quick fix. I'll commit it shortly.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/194#issuecomment-58389750 +1 It's a nice finding and quick fix. I'll commit it shortly.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tajo/pull/194

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/194
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-build #403 (See https://builds.apache.org/job/Tajo-master-build/403/)
        TAJO-1093: DateTimeFormat.to_char() is slower than SimpleDateFormat.format(). (bug fix) (hyunsik: rev bbd7a768a161be0e529ef41dea325fae196a4c1d)

        • tajo-common/src/main/java/org/apache/tajo/util/datetime/DateTimeFormat.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #403 (See https://builds.apache.org/job/Tajo-master-build/403/ ) TAJO-1093 : DateTimeFormat.to_char() is slower than SimpleDateFormat.format(). (bug fix) (hyunsik: rev bbd7a768a161be0e529ef41dea325fae196a4c1d) tajo-common/src/main/java/org/apache/tajo/util/datetime/DateTimeFormat.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-CODEGEN-build #45 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/45/)
        TAJO-1093: DateTimeFormat.to_char() is slower than SimpleDateFormat.format(). (bug fix) (hyunsik: rev bbd7a768a161be0e529ef41dea325fae196a4c1d)

        • tajo-common/src/main/java/org/apache/tajo/util/datetime/DateTimeFormat.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-CODEGEN-build #45 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/45/ ) TAJO-1093 : DateTimeFormat.to_char() is slower than SimpleDateFormat.format(). (bug fix) (hyunsik: rev bbd7a768a161be0e529ef41dea325fae196a4c1d) tajo-common/src/main/java/org/apache/tajo/util/datetime/DateTimeFormat.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-block_iteration-branch-build #15 (See https://builds.apache.org/job/Tajo-block_iteration-branch-build/15/)
        TAJO-1093: DateTimeFormat.to_char() is slower than SimpleDateFormat.format(). (Jihun Kang via hyunsik) (hyunsik: rev d0f9ebc1c501721ecee5422534b1740e38105996)

        • tajo-common/src/main/java/org/apache/tajo/util/datetime/DateTimeFormat.java
        • CHANGES
          TAJO-1093: DateTimeFormat.to_char() is slower than SimpleDateFormat.format(). (bug fix) (hyunsik: rev bbd7a768a161be0e529ef41dea325fae196a4c1d)
        • tajo-common/src/main/java/org/apache/tajo/util/datetime/DateTimeFormat.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-block_iteration-branch-build #15 (See https://builds.apache.org/job/Tajo-block_iteration-branch-build/15/ ) TAJO-1093 : DateTimeFormat.to_char() is slower than SimpleDateFormat.format(). (Jihun Kang via hyunsik) (hyunsik: rev d0f9ebc1c501721ecee5422534b1740e38105996) tajo-common/src/main/java/org/apache/tajo/util/datetime/DateTimeFormat.java CHANGES TAJO-1093 : DateTimeFormat.to_char() is slower than SimpleDateFormat.format(). (bug fix) (hyunsik: rev bbd7a768a161be0e529ef41dea325fae196a4c1d) tajo-common/src/main/java/org/apache/tajo/util/datetime/DateTimeFormat.java

          People

          • Assignee:
            ykrips Jihun Kang
            Reporter:
            hjkim Hyoungjun Kim
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development