Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1959

Hard to tell when a cluster is done starting up

    XMLWordPrintableJSON

Details

    Description

      Restarting a cluster that has a good amount of data, it's hard to tell when it's "done". Right now the things I do:

      • Run ksck, wait until most tablets are not in "unavailable" or "boostrapping" state.
      • Watch the metrics and see when the data under management is close to where it was before restarting (it grows as tablets are getting bootstrapped).
      • Look at the tablet server web UIs for tablets, compare how many are done bootstrapping VS in the process of VS not started.

      Ideas on how to improve this:

      • In the master's web UI for tablet servers, show how many tablets are running VS not running (I wouldn't add anything about tombstoned tablets)
      • Add metrics for tablets in different states.

      Attachments

        Issue Links

          Activity

            People

              achennaka@cloudera.com Abhishek Chennaka
              jdcryans Jean-Daniel Cryans
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: