HBase
  1. HBase
  2. HBASE-3014

Change UnknownScannerException log level to WARN

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Trivial Trivial
    • Resolution: Cannot Reproduce
    • Affects Version/s: 0.20.6
    • Fix Version/s: None
    • Component/s: regionserver
    • Labels:
      None

      Description

      I see a lot of UnknownScannerException messages in the log at ERROR level when I'm running a MapReduce job that scans an HBase table. These messages are logged under normal conditions, and according to Jean-Daniel Cryans, should probably be logged at a less severe log level like WARN.

      Example error message:

      2010-09-16 09:20:52,398 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: 
      org.apache.hadoop.hbase.UnknownScannerException: Name: -8711007779313115048
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1880)
      	at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
      	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
      

      Reference to the HBase users mailing list thread where this was originally discussed:
      http://markmail.org/thread/ttzbi6c7et6mrq6o

      This is a simple, change, so I didn't include a formal patch. If one is required, I will gladly create and attach one.

        Activity

        stack made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Cannot Reproduce [ 5 ]
        Hide
        stack added a comment -

        Marking as 'can not repro'. I think this issue is actually fixed after doing a survey. No where do we log this exception explicitly at the ERROR level (not any more at least). It is all INFO-level that I can see.

        Show
        stack added a comment - Marking as 'can not repro'. I think this issue is actually fixed after doing a survey. No where do we log this exception explicitly at the ERROR level (not any more at least). It is all INFO-level that I can see.
        Hide
        stack added a comment -

        @Ted I took a look at your patch. It seems to address an issue other than what Ken describes. Do you want to open a new issue for your patch (Add a bit of justification for why you think server name is needed on the exception – is it not decodable otherwise? Maybe not?) Thanks Ted.

        Show
        stack added a comment - @Ted I took a look at your patch. It seems to address an issue other than what Ken describes. Do you want to open a new issue for your patch (Add a bit of justification for why you think server name is needed on the exception – is it not decodable otherwise? Maybe not?) Thanks Ted.
        Hide
        Ted Yu added a comment -

        I attached a patch which includes region server name in exception messages.
        I didn't change log level.

        Show
        Ted Yu added a comment - I attached a patch which includes region server name in exception messages. I didn't change log level.
        Ted Yu made changes -
        Attachment hbase-3014.patch [ 12455315 ]
        Hide
        Ted Yu added a comment -

        Adding region server name to exception message

        Show
        Ted Yu added a comment - Adding region server name to exception message
        stack made changes -
        Comment [ From mailing list:

        {code}
        I agree it needs some clarification, since that stuff evolved in
        disparate ways. Historically UnknownScannerException has been fatal
        and wasn't recovered from. Right now, the client will recover only if
        the timeout hasn't expired (so you get this only when the region moves
        or it took more than 60 seconds to call next). On top of that,
        TableRecordReaderImpl will recover even if there's a timeout by
        restarting a new scanner. The DoNotRetryIOException is only a way for
        HBase to tell the HBase client that it shouldn't retry in the normal
        retry code inside HConnectionManager, it's not a way to tell the
        actual user that he shouldn't create a new scanner and retry.

        Thus, the way I understand it, the fact that TRRI recovers from USE is
        a design choice the same way someone using Scan in his code could
        decide to retry scanning with a new scanner upon getting that error. I
        like the way it currently works because if USE comes out of the
        ResultScanner, it means that it took more than 60 seconds to process
        one next() invocation so something is wrong (but the user can ignore
        it like TRRI does).

        That said, the exception should be printed as a WARN in the region
        server log and probably shouldn't care printing a stack trace.

        J-D
        {code}

        ]
        Ken Weiner made changes -
        Field Original Value New Value
        Priority Major [ 3 ] Trivial [ 5 ]
        Ken Weiner created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Ken Weiner
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development