Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
any
-
None
Description
In development environments, developers often use local_hadoop for unit and developer regression testing. Often these test environments are on workstations shared between many developers. When running regressions overnight, quite frequently the HMaster process will die due to timeouts if the workstation is particularly busy. This sometimes causes HBase errors during the tests but more often causes hangs. It would be nice to have a tool that will monitor HMaster and if it goes away, try to restart it. It has been observed that restarting it often resolves the hangs, allowing the regression run to continue.
Attachments
Issue Links
- links to