HBase
  1. HBase
  2. HBASE-3929

Add option to HFile tool to produce basic stats

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.92.0
    • Fix Version/s: 0.92.0
    • Component/s: io
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In looking at HBASE-3421 I wrote a small tool to scan an HFile and produce some basic statistics about it:

      • min/mean/max key size, value size (uncompressed)
      • min/mean/max number of columns per row (uncompressed)
      • min/mean/max number of bytes per row (uncompressed)
      • the key of the largest row
      1. hbase-3929-draft.patch
        4 kB
        Matteo Bertozzi
      2. hbase-3929-draft.txt
        5 kB
        Todd Lipcon
      3. HBASE-3929-v2.patch
        4 kB
        Matteo Bertozzi
      4. HBASE-3929-v3.patch
        4 kB
        Matteo Bertozzi

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          here's the draft of the code.

          before commit, I think we should refactor all of the HFile "Tool" stuff out of HFile into a new class.

          Show
          Todd Lipcon added a comment - here's the draft of the code. before commit, I think we should refactor all of the HFile "Tool" stuff out of HFile into a new class.
          Hide
          stack added a comment -

          This looks great. +1 on commit whether or not we wait on factoring out the cmd-line hfile stuff.

          Show
          stack added a comment - This looks great. +1 on commit whether or not we wait on factoring out the cmd-line hfile stuff.
          Hide
          stack added a comment -

          Moving out of 0.92.0. Pull it back in if you think different.

          Show
          stack added a comment - Moving out of 0.92.0. Pull it back in if you think different.
          Hide
          stack added a comment -

          before commit, I think we should refactor all of the HFile "Tool" stuff out of HFile into a new class.

          How about doing this some other time in another issue and commit what we have here?

          If we change the door into the HFile, we'll need to update docs, lars will have to change his book, the world will end, don't you think?

          Show
          stack added a comment - before commit, I think we should refactor all of the HFile "Tool" stuff out of HFile into a new class. How about doing this some other time in another issue and commit what we have here? If we change the door into the HFile, we'll need to update docs, lars will have to change his book, the world will end, don't you think?
          Hide
          Todd Lipcon added a comment -

          We can always leave a 'main()' for one version which just calls through to the new tool after printing a deprecation method.

          Show
          Todd Lipcon added a comment - We can always leave a 'main()' for one version which just calls through to the new tool after printing a deprecation method.
          Hide
          stack added a comment -

          Yeah, you could, in a new JIRA? (smile)

          Show
          stack added a comment - Yeah, you could, in a new JIRA? (smile)
          Hide
          stack added a comment -

          @Todd Patch as is adds value. The refactor adds value, but its minuscule compared. You are busy. I don't want the minuscule getting in the way of our getting value. Thats my rationale.

          Show
          stack added a comment - @Todd Patch as is adds value. The refactor adds value, but its minuscule compared. You are busy. I don't want the minuscule getting in the way of our getting value. Thats my rationale.
          Hide
          Todd Lipcon added a comment -

          My anti-commit rationale is that, if we commit this, then there won't be any motivation to refactor later. If we make this commit conditional on cleaning it up first, we're more likely to actually clean up!

          Show
          Todd Lipcon added a comment - My anti-commit rationale is that, if we commit this, then there won't be any motivation to refactor later. If we make this commit conditional on cleaning it up first, we're more likely to actually clean up!
          Hide
          stack added a comment -

          Makes sense. np.

          Show
          stack added a comment - Makes sense. np.
          Hide
          Matteo Bertozzi added a comment -

          patch updated for hbase trunk

          Show
          Matteo Bertozzi added a comment - patch updated for hbase trunk
          Hide
          Ted Yu added a comment -

          @Matteo:
          Are you going to perform the refactoring Todd mentioned ?

          Thanks

          Show
          Ted Yu added a comment - @Matteo: Are you going to perform the refactoring Todd mentioned ? Thanks
          Hide
          Matteo Bertozzi added a comment -

          From 0.92 the HFile.main() contains just a call to HFilePrettyPrinter.run()
          So there's no more the "Tool" code inside the HFile.java

          Probably was not the refactor that todd has in mind, but it solve the first todd's thought:
          'we should refactor all of the HFile "Tool" stuff out of HFile into a new class.'

          Show
          Matteo Bertozzi added a comment - From 0.92 the HFile.main() contains just a call to HFilePrettyPrinter.run() So there's no more the "Tool" code inside the HFile.java Probably was not the refactor that todd has in mind, but it solve the first todd's thought: 'we should refactor all of the HFile "Tool" stuff out of HFile into a new class.'
          Hide
          Todd Lipcon added a comment -

          Thanks for updating the patch to trunk. A couple of comments (fun to look back over my own code from a few months back):

          • let's rename pkv to prevKV
          • in the case of an empty HFile, we would currently throw a divide-by-zero. In LongStats.toString, we should check for count == 0 and return "no data" or something
          Show
          Todd Lipcon added a comment - Thanks for updating the patch to trunk. A couple of comments (fun to look back over my own code from a few months back): let's rename pkv to prevKV in the case of an empty HFile, we would currently throw a divide-by-zero. In LongStats.toString, we should check for count == 0 and return "no data" or something
          Hide
          Matteo Bertozzi added a comment -

          Currently HFilePrettyPrinter raise a couple of exceptions if the HFile is Empty, just because it doesn't check if seekTo() returns true or false, and the first call after seekTo() is a scanner.getKeyValue() so you get a NPE...

          I've added a v2 patch with the pkv rename, count == 0 handled, and seekTo checked to fix the NPE.

          Show
          Matteo Bertozzi added a comment - Currently HFilePrettyPrinter raise a couple of exceptions if the HFile is Empty, just because it doesn't check if seekTo() returns true or false, and the first call after seekTo() is a scanner.getKeyValue() so you get a NPE... I've added a v2 patch with the pkv rename, count == 0 handled, and seekTo checked to fix the NPE.
          Hide
          Todd Lipcon added a comment -

          +1, looks good to me. I'll commit to 92 and trunk since this won't affect stability in any way.

          Show
          Todd Lipcon added a comment - +1, looks good to me. I'll commit to 92 and trunk since this won't affect stability in any way.
          Hide
          Todd Lipcon added a comment -

          actually, looks like HBASE-4595 conflicts with this. Would you mind updating one last time for trunk?

          Show
          Todd Lipcon added a comment - actually, looks like HBASE-4595 conflicts with this. Would you mind updating one last time for trunk?
          Hide
          Matteo Bertozzi added a comment -

          Added v3 that applies to trunk after HBASE-4595 integration.

          Show
          Matteo Bertozzi added a comment - Added v3 that applies to trunk after HBASE-4595 integration.
          Hide
          Todd Lipcon added a comment -

          Committed to trunk and 92. Thanks Matteo

          Show
          Todd Lipcon added a comment - Committed to trunk and 92. Thanks Matteo
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2352 (See https://builds.apache.org/job/HBase-TRUNK/2352/)
          HBASE-3929 Add option to HFile tool to produce basic stats

          todd :
          Files :

          • /hbase/trunk/CHANGES.txt
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2352 (See https://builds.apache.org/job/HBase-TRUNK/2352/ ) HBASE-3929 Add option to HFile tool to produce basic stats todd : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92 #79 (See https://builds.apache.org/job/HBase-0.92/79/)
          HBASE-3929 Add option to HFile tool to produce basic stats

          todd :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
          Show
          Hudson added a comment - Integrated in HBase-0.92 #79 (See https://builds.apache.org/job/HBase-0.92/79/ ) HBASE-3929 Add option to HFile tool to produce basic stats todd : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java

            People

            • Assignee:
              Matteo Bertozzi
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development