HBase
  1. HBase
  2. HBASE-2889

Tool to look at HLogs -- parse and tail -f

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Later
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: wal
    • Labels:
      None

      Description

      We need a tool for looking at wals. A tail -f would be sweet though it might be ugly having to open each time it wants to dump next lot of edits out wal log.

      From Kannan:

      20:45 <kannan> WALEdit already has a toString().
      20:46 <kannan> And append/sync support allows you to read concurrently a file that is being written too... (like JDs replication).
      20:46 <kannan> So it would be good if I could do a bin/hbase hlogprint -f hdfs://...../.logs/<filename> -v -p
      20:46 <kannan> much like the HFile pretty printer... except the HLog one is effectively a tail -f.
      
      1. 2889.txt
        7 kB
        stack
      2. HBASE-2889_2.patch
        2 kB
        Nicolas Spiegelberg

        Issue Links

          Activity

          Hide
          stack added a comment -

          There is already

          $ hbase/bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump hdfs://example.com:9000/hbase/10.10.21.18%3A60020.1283907407862
          

          Need to make it tail...

          Also, need to add this to the dump command:

          Index: src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          ===================================================================
          --- src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java    (revision 993076)
          +++ src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java    (working copy)
          @@ -1830,9 +1830,12 @@
                   }
                   Reader log = getReader(fs, logPath, conf);
                   try {
          +          int count = 0;
                     HLog.Entry entry;
                     while ((entry = log.next()) != null) {
          -            System.out.println(entry.toString());
          +            System.out.println("#" + count + ", pos=" + log.getPosition() +
          +              " " + entry.toString());
          +            count++;
                     }
                   } finally {
                     log.close();
          

          ... so can see where emitted edit is in file.

          Show
          stack added a comment - There is already $ hbase/bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump hdfs: //example.com:9000/hbase/10.10.21.18%3A60020.1283907407862 Need to make it tail... Also, need to add this to the dump command: Index: src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java =================================================================== --- src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java (revision 993076) +++ src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java (working copy) @@ -1830,9 +1830,12 @@ } Reader log = getReader(fs, logPath, conf); try { + int count = 0; HLog.Entry entry; while ((entry = log.next()) != null ) { - System .out.println(entry.toString()); + System .out.println( "#" + count + ", pos=" + log.getPosition() + + " " + entry.toString()); + count++; } } finally { log.close(); ... so can see where emitted edit is in file.
          Hide
          stack added a comment -

          Add to hlog main:

          + Emission of filename, position, and edit count if issue. We used to do this:

          $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump ~/Downloads/10.10.21.18%3A60020.1283907407862  > /dev/null 
          java.io.IOException: File is corrupt!
                  at org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1907)
                  at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1932)
                  at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1837)
                  at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1883)
                  at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:121)
                  at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:113)
                  at org.apache.hadoop.hbase.regionserver.wal.HLog.dump(HLog.java:1817)
                  at org.apache.hadoop.hbase.regionserver.wal.HLog.main(HLog.java:1867)
          

          ... now we do this:

          $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump ~/Downloads/10.10.21.18%3A60020.1283907407862  > /dev/nulljava.io.IOException: /Users/Stack/Downloads/10.10.21.18%3A60020.1283907407862, pos=20076465, edit=6928
                  at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:165)
                  at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:137)
                  at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:122)
                  at org.apache.hadoop.hbase.regionserver.wal.HLog.dump(HLog.java:1817)
                  at org.apache.hadoop.hbase.regionserver.wal.HLog.main(HLog.java:1867)
          Caused by: java.io.IOException: File is corrupt!
                  at org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1907)
                  at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1932)
                  at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1837)
                  at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1883)
                  at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:135)
          

          + Made it so if an error, the program returns non-zero exit value.

          Show
          stack added a comment - Add to hlog main: + Emission of filename, position, and edit count if issue. We used to do this: $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump ~/Downloads/10.10.21.18%3A60020.1283907407862 > /dev/ null java.io.IOException: File is corrupt! at org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1907) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1932) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1837) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1883) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:121) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:113) at org.apache.hadoop.hbase.regionserver.wal.HLog.dump(HLog.java:1817) at org.apache.hadoop.hbase.regionserver.wal.HLog.main(HLog.java:1867) ... now we do this: $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump ~/Downloads/10.10.21.18%3A60020.1283907407862 > /dev/nulljava.io.IOException: /Users/Stack/Downloads/10.10.21.18%3A60020.1283907407862, pos=20076465, edit=6928 at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:165) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:137) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:122) at org.apache.hadoop.hbase.regionserver.wal.HLog.dump(HLog.java:1817) at org.apache.hadoop.hbase.regionserver.wal.HLog.main(HLog.java:1867) Caused by: java.io.IOException: File is corrupt! at org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1907) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1932) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1837) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1883) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:135) + Made it so if an error, the program returns non-zero exit value.
          Hide
          stack added a comment -

          I commtted the 2889.txt patch for now to branch and trunk. There is still tailing of hlogs to add before can close this issue.

          Show
          stack added a comment - I commtted the 2889.txt patch for now to branch and trunk. There is still tailing of hlogs to add before can close this issue.
          Hide
          Kannan Muthukkaruppan added a comment -

          Thanks for the update on this Stack.

          Show
          Kannan Muthukkaruppan added a comment - Thanks for the update on this Stack.
          Hide
          Jeff Hammerbacher added a comment -

          FWIW, if the WAL were serialized as Avro records, tools in many languages would be available for looking at the logs. There may be other drawbacks, of course.

          Show
          Jeff Hammerbacher added a comment - FWIW, if the WAL were serialized as Avro records, tools in many languages would be available for looking at the logs. There may be other drawbacks, of course.
          Hide
          Andrew Purtell added a comment -

          @Jeff: HBASE-2055 is open for that. It's been a while, perhaps we can circle back to it soon.

          Show
          Andrew Purtell added a comment - @Jeff: HBASE-2055 is open for that. It's been a while, perhaps we can circle back to it soon.
          Hide
          Nicolas Spiegelberg added a comment -

          FYI: the current patch downgrades all derived classes to an IOE. e.g. any code to detect EOFExceptions will not be triggered. The general problem we seem to have is that we need to differentiate between a Network IOE and a File Format IOE.

          Network = we need to fail and let another server try to take over
          FileFormat = our file was written or parsed incorrectly. retrying won't fix anything. We need to just open what we have and store the original file away for later analysis.

          Show
          Nicolas Spiegelberg added a comment - FYI: the current patch downgrades all derived classes to an IOE. e.g. any code to detect EOFExceptions will not be triggered. The general problem we seem to have is that we need to differentiate between a Network IOE and a File Format IOE. Network = we need to fail and let another server try to take over FileFormat = our file was written or parsed incorrectly. retrying won't fix anything. We need to just open what we have and store the original file away for later analysis.
          Hide
          stack added a comment -

          @Nicolas You referring to this method: addFileInfoToException? Should I open issue to fix?

          Show
          stack added a comment - @Nicolas You referring to this method: addFileInfoToException? Should I open issue to fix?
          Hide
          Nicolas Spiegelberg added a comment -

          looks like the EOF bug was fixed in HBASE-2961

          Show
          Nicolas Spiegelberg added a comment - looks like the EOF bug was fixed in HBASE-2961
          Hide
          Nicolas Spiegelberg added a comment -

          generalized the fix on trunk, using reflection. also added a couple variables that I've found helpful while dealing with EOF problems. should be able to apply this against trunk (r1000684)

          Show
          Nicolas Spiegelberg added a comment - generalized the fix on trunk, using reflection. also added a couple variables that I've found helpful while dealing with EOF problems. should be able to apply this against trunk (r1000684)
          Hide
          stack added a comment -

          @Nicolas Committed your HBASE-2889_2.patch. Thanks for fixing my booboo.

          Show
          stack added a comment - @Nicolas Committed your HBASE-2889 _2.patch. Thanks for fixing my booboo.
          Hide
          stack added a comment -

          Moving out. Don't need to finish this 'feature' for 0.90.

          Show
          stack added a comment - Moving out. Don't need to finish this 'feature' for 0.90.
          Hide
          Nicolas Spiegelberg added a comment -

          Is this not already committed in 0.90? We have these patches applied internally and have been running the tool without problem.

          Show
          Nicolas Spiegelberg added a comment - Is this not already committed in 0.90? We have these patches applied internally and have been running the tool without problem.
          Hide
          stack added a comment -

          There is another piece to do, the bit Kannan wanted where you could tail a live WAL and see the edits stream by. Thats the piece that is keeping this issue open and that won't be done for 0.90 (sorry, this issue got a bit messy with two commits against it).

          Show
          stack added a comment - There is another piece to do, the bit Kannan wanted where you could tail a live WAL and see the edits stream by. Thats the piece that is keeping this issue open and that won't be done for 0.90 (sorry, this issue got a bit messy with two commits against it).
          Hide
          Andrew Purtell added a comment -

          Resurrect or close?

          Show
          Andrew Purtell added a comment - Resurrect or close?
          Hide
          stack added a comment -

          Resolving. Old. When HDFS has a tail, we'll make an issue to implement tail of WAL

          Show
          stack added a comment - Resolving. Old. When HDFS has a tail, we'll make an issue to implement tail of WAL
          Hide
          Lars Hofhansl added a comment -

          We could do the equivalent of what replication is doing.

          Show
          Lars Hofhansl added a comment - We could do the equivalent of what replication is doing.
          Hide
          stack added a comment -

          We could do the equivalent of what replication is doing.

          We could only it is very ugly

          Show
          stack added a comment - We could do the equivalent of what replication is doing. We could only it is very ugly

            People

            • Assignee:
              stack
              Reporter:
              stack
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development