Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: Statistics
    • Labels:
      None

      Description

      Recently, support for stats gathering via counter was added Although, its useful it has following issues:

      1. HIVE-6500.patch
        66 kB
        Ashutosh Chauhan
      2. HIVE-6500.2.patch
        66 kB
        Ashutosh Chauhan
      3. HIVE-6500.3.patch
        66 kB
        Ashutosh Chauhan

        Issue Links

          Activity

          Hide
          Damien Carol added a comment -

          Plz ignore my last comment

          Show
          Damien Carol added a comment - Plz ignore my last comment
          Hide
          Damien Carol added a comment -

          Plz ignore my last comment

          Show
          Damien Carol added a comment - Plz ignore my last comment
          Hide
          Damien Carol added a comment -

          Ashutosh Chauhan Did you miss the property hive.stats.tmp.loc in common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ?

          Show
          Damien Carol added a comment - Ashutosh Chauhan Did you miss the property hive.stats.tmp.loc in common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ?
          Hide
          Damien Carol added a comment -

          Lefty Leverenz This JIRA added a new property NOT documented hive.stats.tmp.loc
          Also this property is not added in "hive-default.xml" system.

          Show
          Damien Carol added a comment - Lefty Leverenz This JIRA added a new property NOT documented hive.stats.tmp.loc Also this property is not added in "hive-default.xml" system.
          Hide
          Lefty Leverenz added a comment -

          Helps a lot, thanks. Dunno why I didn't think of removing the parentheses – d'oh! Fixed now:

          Show
          Lefty Leverenz added a comment - Helps a lot, thanks. Dunno why I didn't think of removing the parentheses – d'oh! Fixed now: Configuration Properties – hive.stats.dbclass
          Hide
          Szehon Ho added a comment -

          Oh, I see where you got it from, it is a regular expression. Then I think even the paren's is not needed, as it's part of the regex. You can say jdbc:<database> and explain <database> is derby, mysql, etc, it should be ok as an example. Even I dont know the whole list, from what I can tell the code uses <database> for some special logic if its derby, but not anything else. Hope that helps.

          Show
          Szehon Ho added a comment - Oh, I see where you got it from, it is a regular expression. Then I think even the paren's is not needed, as it's part of the regex. You can say jdbc:<database> and explain <database> is derby, mysql, etc, it should be ok as an example. Even I dont know the whole list, from what I can tell the code uses <database> for some special logic if its derby, but not anything else. Hope that helps.
          Hide
          Lefty Leverenz added a comment -

          Super! Thanks, Szehon Ho. I've removed that dot, which came from the PatternValidator which I never understood: "jdbc(:.*)" – but I knew the original default was jdbc:derby without any dot.

          Show
          Lefty Leverenz added a comment - Super! Thanks, Szehon Ho . I've removed that dot, which came from the PatternValidator which I never understood: "jdbc(:.*)" – but I knew the original default was jdbc:derby without any dot.
          Hide
          Szehon Ho added a comment -

          Lefty Leverenz This looks good to me, although my knowledge is limited about stats. Only comment is there seems to be an un-needed dot on the configuration wiki page: jdbc(:.<database>)

          I think it makes sense to fix that as you suggested in HIVE-6586. Thanks!

          PS Yes , I think so. On https://cwiki.apache.org/confluence/display/Hive/Home, it's not listed in the 'children' list on the left. I misunderstood to think all supported pages are listed there..

          Show
          Szehon Ho added a comment - Lefty Leverenz This looks good to me, although my knowledge is limited about stats. Only comment is there seems to be an un-needed dot on the configuration wiki page: jdbc(:.<database>) I think it makes sense to fix that as you suggested in HIVE-6586 . Thanks! PS Yes , I think so. On https://cwiki.apache.org/confluence/display/Hive/Home , it's not listed in the 'children' list on the left. I misunderstood to think all supported pages are listed there..
          Hide
          Lefty Leverenz added a comment -
          Show
          Lefty Leverenz added a comment - I made some changes in the docs, please review: StatsDev – Newly Created Tables Configuration Properties – hive.stats.dbclass
          Hide
          Lefty Leverenz added a comment -

          Good catch, Szehon Ho. Yes, the "Newly Created Tables" section of the StatsDev wikidoc needs to be updated, keeping in mind that releases 0.7 though 0.12 have "jdbc:derby" as the default for hive.stats.dbclass so we can't just swap in the new default value. Linking to/from hive.stats.dbclass in the Configuration Properties doc will help with future maintenance.

          Also, the HiveConf.java description of hive.stats.dbclass omits the "fs" value. I can correct that in the next patch for HIVE-6586, perhaps using the wiki description or a variant of it:

          The storage that stores temporary Hive statistics. In FS based statistics collection, each task writes statistics it has collected in a file on the filesystem, which will be aggregated after the job has finished. Supported values are fs (filesystem), jdbc(:.*), hbase, counter and custom (HIVE-6500).

          Suggested changes to that description: (1) change "FS" to "filesystem (fs)", (2) remove or move "(HIVE-6500)" so it doesn't imply that HIVE-6500 added "custom", (3) change "jdbc(:.*)" to "jdbc:<database>" and explain that <database> can be derby, mysql, ... and what others – is there a complete list anywhere?

          P.S. What do you mean by "It is actually not linked from the top"? Top of what? Maybe you mean it belongs on the Home page. Currently it's listed on the LanguageManual page, but that's easy to change – we can even list it both places.

          Show
          Lefty Leverenz added a comment - Good catch, Szehon Ho . Yes, the "Newly Created Tables" section of the StatsDev wikidoc needs to be updated, keeping in mind that releases 0.7 though 0.12 have "jdbc:derby" as the default for hive.stats.dbclass so we can't just swap in the new default value. Linking to/from hive.stats.dbclass in the Configuration Properties doc will help with future maintenance. StatsDev – Newly Created Tables Configuration Properties – hive.stats.dbclass Also, the HiveConf.java description of hive.stats.dbclass omits the "fs" value. I can correct that in the next patch for HIVE-6586 , perhaps using the wiki description or a variant of it: The storage that stores temporary Hive statistics. In FS based statistics collection, each task writes statistics it has collected in a file on the filesystem, which will be aggregated after the job has finished. Supported values are fs (filesystem), jdbc(:.*), hbase, counter and custom ( HIVE-6500 ). Suggested changes to that description: (1) change "FS" to "filesystem (fs)", (2) remove or move "( HIVE-6500 )" so it doesn't imply that HIVE-6500 added "custom", (3) change "jdbc(:.*)" to "jdbc:<database>" and explain that <database> can be derby, mysql, ... and what others – is there a complete list anywhere? P.S. What do you mean by "It is actually not linked from the top"? Top of what? Maybe you mean it belongs on the Home page. Currently it's listed on the LanguageManual page, but that's easy to change – we can even list it both places.
          Hide
          Szehon Ho added a comment -

          Hi Lefty Leverenz I had a question about docs. I came across an outdated wiki page still mentioning db as the only option, should that page be maintained as FS is now supported? https://cwiki.apache.org/confluence/display/Hive/StatsDev It is actually not linked from the top, but it does seem useful. Not sure the policy for these pages?

          Show
          Szehon Ho added a comment - Hi Lefty Leverenz I had a question about docs. I came across an outdated wiki page still mentioning db as the only option, should that page be maintained as FS is now supported? https://cwiki.apache.org/confluence/display/Hive/StatsDev It is actually not linked from the top, but it does seem useful. Not sure the policy for these pages?
          Hide
          Lefty Leverenz added a comment - - edited

          Unfortunately my review board advice not to patch hive-default.xml.template led to release 0.13.0 having the obsolete default value for hive.stats.dbclass in the template file. But it's updated in the most recent patch for HIVE-6037, so presumably it will be corrected by release 0.14.0.

          Sorry about that.

          Edit: The updated parameter description didn't make it into the new version of HiveConf.java, so it needs to be fixed in another patch. (I suggest HIVE-6586.)

          Show
          Lefty Leverenz added a comment - - edited Unfortunately my review board advice not to patch hive-default.xml.template led to release 0.13.0 having the obsolete default value for hive.stats.dbclass in the template file. But it's updated in the most recent patch for HIVE-6037 , so presumably it will be corrected by release 0.14.0. Sorry about that. Edit: The updated parameter description didn't make it into the new version of HiveConf.java, so it needs to be fixed in another patch. (I suggest HIVE-6586 .)
          Hide
          Lefty Leverenz added a comment -

          The part I'm not sure of is "jdbc(:.*)" but plain "jdbc" didn't seem sufficient. So how about "jdbc:<database>"? What other values can it have for <database> besides "derby" and "mysql"?

          Show
          Lefty Leverenz added a comment - The part I'm not sure of is "jdbc(:.*)" but plain "jdbc" didn't seem sufficient. So how about "jdbc:<database>"? What other values can it have for <database> besides "derby" and "mysql"?
          Hide
          Lefty Leverenz added a comment -

          I updated the wiki for hive.stats.dbclass – please review:

          Hive 0.13 and later: The storage that stores temporary Hive statistics. In FS based statistics collection, each task writes statistics it has collected in a file on the filesystem, which will be aggregated after the job has finished. Supported values are fs (filesystem), jdbc(:.*), hbase, counter and custom (HIVE-6500).

          Show
          Lefty Leverenz added a comment - I updated the wiki for hive.stats.dbclass – please review: Hive 0.13 and later: The storage that stores temporary Hive statistics. In FS based statistics collection, each task writes statistics it has collected in a file on the filesystem, which will be aggregated after the job has finished. Supported values are fs (filesystem), jdbc(:.*), hbase, counter and custom ( HIVE-6500 ). Configuration Properties: hive.stats.dbclass
          Hide
          Ashutosh Chauhan added a comment -

          Committed to trunk.

          Show
          Ashutosh Chauhan added a comment - Committed to trunk.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12631766/HIVE-6500.3.patch

          ERROR: -1 due to 2 failed/errored test(s), 5158 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1563/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1563/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 2 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12631766

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12631766/HIVE-6500.3.patch ERROR: -1 due to 2 failed/errored test(s), 5158 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1563/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1563/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12631766
          Hide
          Ashutosh Chauhan added a comment -

          Reuploading same patch for Hive QA to pick up.

          Show
          Ashutosh Chauhan added a comment - Reuploading same patch for Hive QA to pick up.
          Hide
          Navis added a comment -

          Sad to hear that counter stat is disabled. +1

          Show
          Navis added a comment - Sad to hear that counter stat is disabled. +1
          Hide
          Gunther Hagleitner added a comment -

          LGTM +1

          Show
          Gunther Hagleitner added a comment - LGTM +1
          Hide
          Gunther Hagleitner added a comment -

          Small comments/question on rb.

          Show
          Gunther Hagleitner added a comment - Small comments/question on rb.
          Hide
          Ashutosh Chauhan added a comment -

          Navis Since you have context in this area, would you like to take a look. RB request : https://reviews.apache.org/r/18459/

          Show
          Ashutosh Chauhan added a comment - Navis Since you have context in this area, would you like to take a look. RB request : https://reviews.apache.org/r/18459/
          Hide
          Ashutosh Chauhan added a comment -

          In FS based stats collection, idea is each task will write stats it has collected in a file on FS, which than will be aggregated after job has finished.

          Show
          Ashutosh Chauhan added a comment - In FS based stats collection, idea is each task will write stats it has collected in a file on FS, which than will be aggregated after job has finished.

            People

            • Assignee:
              Ashutosh Chauhan
              Reporter:
              Ashutosh Chauhan
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development