Whirr
  1. Whirr
  2. WHIRR-517

Add a retry loop around apt-get and yum commands to overcome transient errors

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.1
    • Component/s: core
    • Labels:
      None

      Description

      Often, installation on one or more nodes fails because of transient errors
      (mostly failed connection attempts).
      Therefore we should make use of command's built-in retry options and/or add our own retry loops.

      1. WHIRR-517-for-0.7.1.patch
        38 kB
        Andrei Savu
      2. WHIRR-517-for-0.7.1.patch
        32 kB
        Andrei Savu
      3. WHIRR-517-evaluation.patch
        6 kB
        Karel Vervaeke

        Issue Links

          Activity

          Hide
          Karel Vervaeke added a comment -

          Attached a patch with proposed implementation.
          It adds 3 functions: a generic 'retry' function, 'retry_apt_get' and 'retry_yum'.
          For now, I've only changed the zookeeper-cdh implementation for evaluation. If no-one forsees major problems i'll start replacing all apt-get and yum calls.

          I had to change the <filtering> in the pom because it would mess up expressions like $((tries-1)). (This also explains why JAVA_HOME wasn't being set properly).

          Show
          Karel Vervaeke added a comment - Attached a patch with proposed implementation. It adds 3 functions: a generic 'retry' function, 'retry_apt_get' and 'retry_yum'. For now, I've only changed the zookeeper-cdh implementation for evaluation. If no-one forsees major problems i'll start replacing all apt-get and yum calls. I had to change the <filtering> in the pom because it would mess up expressions like $((tries-1)). (This also explains why JAVA_HOME wasn't being set properly).
          Hide
          Andrei Savu added a comment -

          +1 looks good to me. Good to commit as soon as we replace all apt-get / yum calls with this.

          Show
          Andrei Savu added a comment - +1 looks good to me. Good to commit as soon as we replace all apt-get / yum calls with this.
          Hide
          Andrei Savu added a comment -

          I will finish this one because I know Karel is busy this week.

          Show
          Andrei Savu added a comment - I will finish this one because I know Karel is busy this week.
          Hide
          Andrei Savu added a comment -

          Also adding this on the roadmap for 0.7.1

          Show
          Andrei Savu added a comment - Also adding this on the roadmap for 0.7.1
          Hide
          Andrei Savu added a comment -

          I am attaching a version of this patch that updates all the services for inclusion in 0.7.1.

          For trunk we need to refactor things to avoid duplication (find a way to make a available to all service a set of core functions with a proper namespace).

          Show
          Andrei Savu added a comment - I am attaching a version of this patch that updates all the services for inclusion in 0.7.1. For trunk we need to refactor things to avoid duplication (find a way to make a available to all service a set of core functions with a proper namespace).
          Hide
          Andrei Savu added a comment -

          I'm starting to run tests for all services with branch-0.7 + this patch. If everything works fine I will commit and cut an RC (after also fixing WHIRR-526 - should not affect the code in any way).

          Show
          Andrei Savu added a comment - I'm starting to run tests for all services with branch-0.7 + this patch. If everything works fine I will commit and cut an RC (after also fixing WHIRR-526 - should not affect the code in any way).
          Hide
          Andrei Savu added a comment -
          Show
          Andrei Savu added a comment - Here is the test matrix for this release: https://docs.google.com/spreadsheet/ccc?key=0AvGPW01Ku6xTdGtBUVU1VWFXRk9FWm5IcGlSSkc0bFE
          Hide
          Andrei Savu added a comment -

          I am attaching an updated version of the patch. All integration tests are passing on aws-ec2 (see the update test matrix for more details). I will commit as soon as I do a bit more testing on cloudservers-us.

          Show
          Andrei Savu added a comment - I am attaching an updated version of the patch. All integration tests are passing on aws-ec2 (see the update test matrix for more details). I will commit as soon as I do a bit more testing on cloudservers-us.
          Hide
          Andrei Savu added a comment -

          Ok, I'm done with testing on cloudservers for today (the UK deployment is returning a large number of internal failures, the US deployment is extremely slow). I will just assume that if the core tests pass and everything works on aws-ec2 everything should also work on cloudservers. I will look into WHIRR-526 and prepare RC0 for 0.7.1. We can do more testing while voting.

          Show
          Andrei Savu added a comment - Ok, I'm done with testing on cloudservers for today (the UK deployment is returning a large number of internal failures, the US deployment is extremely slow). I will just assume that if the core tests pass and everything works on aws-ec2 everything should also work on cloudservers. I will look into WHIRR-526 and prepare RC0 for 0.7.1. We can do more testing while voting.
          Hide
          Andrei Savu added a comment -

          Committed to branch 0.7. I will mark this as fixed for 0.7.1 and create a new issue for 0.8.0 (just to keep the release notes clean). Thanks Karel!

          Show
          Andrei Savu added a comment - Committed to branch 0.7. I will mark this as fixed for 0.7.1 and create a new issue for 0.8.0 (just to keep the release notes clean). Thanks Karel!

            People

            • Assignee:
              Andrei Savu
              Reporter:
              Karel Vervaeke
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development