Uploaded image for project: 'Apache Whirr (retired)'
  1. Apache Whirr (retired)
  2. WHIRR-414

whirr can have a non-zero return code and unterminated (orphaned) host instances

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 0.6.0
    • 0.7.0
    • core
    • None
    • EC2, commandline whirr

    Description

      Whirr can fail to completely start a cluster and indicates this with a non-zero return code. In many (currently intermittent) partial failure scenarios, there are resources still active (EC2 machine instances, in my experience) that are not cleaned up.

      The log contains "IOException: Too many instance failed while bootstrapping!" when I have seen orphaned nodes.

      A non-zero return code should guarantee that all resources are cleaned up. Without this post-condition, these failures require manual inspection and cleanup to stop useless expenses (which is why I marked this bug critical; it needs to be addressed for any kind of cron job triggered whirr).

      Attachments

        1. WHIRR-414-ignore-missing-instances-file.patch
          3 kB
          Andrei Savu
        2. WHIRR-414-ignore-missing-instances-file.patch
          3 kB
          Andrei Savu
        3. WHIRR-414.patch
          6 kB
          Andrei Savu
        4. WHIRR-414.patch
          8 kB
          Andrei Savu
        5. WHIRR-414.patch
          7 kB
          David Alves
        6. WHIRR-414.patch
          7 kB
          David Alves
        7. WHIRR-414.patch
          2 kB
          Andrei Savu

        Activity

          People

            savu.andrei Andrei Savu
            pbaclace Paul Baclace
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: