[KNOX-1093] KNOX Not Handling safemode state of one of the NameNode In HA state - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.10.0
Fix Version/s: 1.2.0
Component/s: Server
Labels:
None

Description

per your code WebHdfsHaDispatch.java , When Safemode exception happened it calls the retryRequest() method. which also calls executeRequest() method as like failover request but the namenode info is not changing for the thread for all of its iteration until maxRetryAttempts=300
and retrySleep=1000 ( 1 sec )
After Max 5 minutes , client retries should pick the right namenode atleast in next attempt.
But in this case if we need to copy a set of files in stipulated time there is X% of connections falls into these namenode and fails. Can we handle that better

try {
         inboundResponse = executeOutboundRequest(outboundRequest);
         writeOutboundResponse(outboundRequest, inboundRequest, outboundResponse, inboundResponse);
      } catch (StandbyException e) {
         LOG.errorReceivedFromStandbyNode(e);
         failoverRequest(outboundRequest, inboundRequest, outboundResponse, inboundResponse, e);
      } catch (SafeModeException e) {
         LOG.errorReceivedFromSafeModeNode(e);
         retryRequest(outboundRequest, inboundRequest, outboundResponse, inboundResponse, e);
      } catch (IOException e) {
         LOG.errorConnectingToServer(outboundRequest.getURI().toString(), e);
         failoverRequest(outboundRequest, inboundRequest, outboundResponse, inboundResponse, e);
      }
   }

Need to change the logic in SafeModeexception state in KNOX HADispatch code to flag the namenode which is stuck in safemode and maintain don't try queue and redirect all further connection only to healthy active namenode . This way X5 of failures we can handle. What do we think

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

KNOX-1093.patch
27/Sep/18 14:28
4 kB
Matthew Sharp

Issue Links

is depended upon by

KNOX-1551 Updated documentation for KNOX-1093 and KNOX-1433

Closed

Activity

People

Assignee:: Matthew Sharp

Reporter:: Rajesh Chandramohan

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 31/Oct/17 23:29

Updated:: 28/Mar/19 13:56

Resolved:: 27/Sep/18 17:20