Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-1121 Improve tracing in Phoenix
  3. PHOENIX-2178

Tracing - total time listed for a certain trace does not correlate with query wall clock time

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 4.5.0
    • Fix Version/s: None
    • Labels:

      Description

      Wall clock for a count * takes over a large table takes 30000+ms however total sum(end_time - start_time) is less than 250ms for trace_id generated for this count * query.

      Output of trace table:
      
      select sum(end_time  - start_time),count(*), description from SYSTEM.TRACING_STATS WHERE TRACE_ID=X group by description;
      +------------------------------------------+------------------------------------------+------------------------------------------+
      |       SUM((END_TIME - START_TIME))       |                 COUNT(1)                 |               DESCRIPTION                |
      +------------------------------------------+------------------------------------------+------------------------------------------+
      | 0                                        | 3                                        | ClientService.Scan                       |
      | 240                                      | 253879                                   | HFileReaderV2.readBlock                  |
      | 1                                        | 1                                        | Scanner opened on server                 |
      +------------------------------------------+------------------------------------------+------------------------------------------+
      
      

        Issue Links

          Activity

          Hide
          ashishtiwari88 Ashish Tiwari added a comment -

          Mujtaba Chohan James Taylor

          Hi All,

          My name is Ashish and I am very interested in working on this project. First of all, I would like to thank you for mentoring the project. I have started working on the proposal and would like some inputs on it:

          1. I would like to reproduce the inconsistencies in SQL trace results. I am setting up my local dev environments by following (https://phoenix.apache.org/develop.html). Are there any performance tests that I can use to reproduce this?

          2. Is there a doc on how the tracing works? What modules are responsible to report stats and how these stats are collected?

          3. How should I go about dividing the tasks for this project?

          About me:
          I have an undergraduate degree in computer science and starting my master's degree in computer science from Arizona State University. I also have 5 years of professional experience developing enterprise web applications in J2EE.

          Thanks,
          Ashish

          Show
          ashishtiwari88 Ashish Tiwari added a comment - Mujtaba Chohan James Taylor Hi All, My name is Ashish and I am very interested in working on this project. First of all, I would like to thank you for mentoring the project. I have started working on the proposal and would like some inputs on it: 1. I would like to reproduce the inconsistencies in SQL trace results. I am setting up my local dev environments by following ( https://phoenix.apache.org/develop.html ). Are there any performance tests that I can use to reproduce this? 2. Is there a doc on how the tracing works? What modules are responsible to report stats and how these stats are collected? 3. How should I go about dividing the tasks for this project? About me: I have an undergraduate degree in computer science and starting my master's degree in computer science from Arizona State University. I also have 5 years of professional experience developing enterprise web applications in J2EE. Thanks, Ashish
          Hide
          ashishtiwari88 Ashish Tiwari added a comment -
          Show
          ashishtiwari88 Ashish Tiwari added a comment - James Taylor
          Hide
          jamestaylor James Taylor added a comment -

          Thanks, Ashish Tiwari - we appreciate the proposal. A good way to find things in Phoenix is to use the search box on our home page: http://phoenix.apache.org/. When you enter "tracing" there, the first hit that comes up is our documentation for this feature. Another good way is to search in JIRA for any issue containing the word "tracing" in the summary. This will point you to the git commits that were made to build this feature. HTH.

          Mujtaba Chohan - would you mind advising Ashish Tiwari and reviewing the proposal?

          Show
          jamestaylor James Taylor added a comment - Thanks, Ashish Tiwari - we appreciate the proposal. A good way to find things in Phoenix is to use the search box on our home page: http://phoenix.apache.org/ . When you enter "tracing" there, the first hit that comes up is our documentation for this feature. Another good way is to search in JIRA for any issue containing the word "tracing" in the summary. This will point you to the git commits that were made to build this feature. HTH. Mujtaba Chohan - would you mind advising Ashish Tiwari and reviewing the proposal?
          Hide
          pranavanc5e Pranavan added a comment -

          Hi James and Mujtaba Chohan. I found the solution for this.

          The difference is huge because the wall clock is calculating the actual time where as sum(endtime - starttime) is calculating the sum of the differences. When it comes to difference, most of the readBlock traces will take nano seconds. There is a small fraction of readBlock traces which end up with the gap of 1 millisecond.

          This the code segment which assigns start and end time.
          builder.addCounter(Interns.info(START.traceName, EMPTY_STRING), span.getStartTimeMillis());
          builder.addCounter(Interns.info(END.traceName, EMPTY_STRING), span.getStopTimeMillis());

          The solution is to store the nanotime. This will eventually lead to get rid of round of errors which we are actually facing now. It will not change the DDL of Tracing table, but the content will be nanotime.

          Show
          pranavanc5e Pranavan added a comment - Hi James and Mujtaba Chohan. I found the solution for this. The difference is huge because the wall clock is calculating the actual time where as sum(endtime - starttime) is calculating the sum of the differences. When it comes to difference, most of the readBlock traces will take nano seconds. There is a small fraction of readBlock traces which end up with the gap of 1 millisecond. This the code segment which assigns start and end time. builder.addCounter(Interns.info(START.traceName, EMPTY_STRING), span.getStartTimeMillis()); builder.addCounter(Interns.info(END.traceName, EMPTY_STRING), span.getStopTimeMillis()); The solution is to store the nanotime. This will eventually lead to get rid of round of errors which we are actually facing now. It will not change the DDL of Tracing table, but the content will be nanotime.
          Hide
          pranavanc5e Pranavan added a comment -

          I have created a pull, but it needs some HTrace level changes. I am working on that. We can merge, once we got the updated version of HTrace.

          PR - https://github.com/apache/phoenix/pull/173

          Show
          pranavanc5e Pranavan added a comment - I have created a pull, but it needs some HTrace level changes. I am working on that. We can merge, once we got the updated version of HTrace. PR - https://github.com/apache/phoenix/pull/173
          Hide
          mujtabachohan Mujtaba Chohan added a comment -

          Pranavan great find! BTW sorry I was away for a while and missed reading your initial proposal.

          Show
          mujtabachohan Mujtaba Chohan added a comment - Pranavan great find! BTW sorry I was away for a while and missed reading your initial proposal.
          Hide
          pranavanc5e Pranavan added a comment -

          Colin has opened a jira for the nano time granularity. I am working on it. The jira link in HTrace - https://issues.apache.org/jira/browse/HTRACE-376

          Show
          pranavanc5e Pranavan added a comment - Colin has opened a jira for the nano time granularity. I am working on it. The jira link in HTrace - https://issues.apache.org/jira/browse/HTRACE-376
          Hide
          pranavanc5e Pranavan added a comment -

          I made a PR to HR - https://github.com/apache/incubator-htrace/pull/10/files. We can merge the PR - https://github.com/apache/phoenix/pull/173 after it gets merged.

          Show
          pranavanc5e Pranavan added a comment - I made a PR to HR - https://github.com/apache/incubator-htrace/pull/10/files . We can merge the PR - https://github.com/apache/phoenix/pull/173 after it gets merged.

            People

            • Assignee:
              Unassigned
              Reporter:
              mujtabachohan Mujtaba Chohan
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development