Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: None
    • Labels:
      None

      Description

      I have had that happen before and now again. We need to handle this better:

      + for i in '`seq 1 3`'
      + curl --retry 3 --silent --show-error --fail -O http://archive.apache.org/dist/hbase/hbase-0.20.6/hbase-0.20.6.tar.gz
      curl: (18) transfer closed with 12646997 bytes remaining to read
      
      1. WHIRR-207.patch
        7 kB
        Andrei Savu
      2. WHIRR-207.patch
        8 kB
        Andrei Savu
      3. WHIRR-207.patch
        10 kB
        Tom White
      4. WHIRR-207.patch
        13 kB
        Andrei Savu

        Activity

        Hide
        Andrei Savu added a comment -

        I've just committed this. Thanks Tom for reviewing.

        Show
        Andrei Savu added a comment - I've just committed this. Thanks Tom for reviewing.
        Hide
        Tom White added a comment -

        +1 looks good.

        Show
        Tom White added a comment - +1 looks good.
        Hide
        Andrei Savu added a comment -

        I've fixed the patch. Tested with hbase(should also cover hadoop and zookeeper) and cassandra.

        Show
        Andrei Savu added a comment - I've fixed the patch. Tested with hbase(should also cover hadoop and zookeeper) and cassandra.
        Hide
        Andrei Savu added a comment -

        Great! I will fix the patch as soon as possible.

        Show
        Andrei Savu added a comment - Great! I will fix the patch as soon as possible.
        Hide
        Tom White added a comment -

        Unfortunately WHIRR-225 broke this patch completely, so I've generated an equivalent. The nice thing is that we only need one copy of the install_tarball function with the WHIRR-225 approach.

        I've tested with ZooKeeper, but haven't done HBase yet, since it has some different semantics for resolving the tar name from the URL (this was a problem with the original patch too). Can we generalize the function to take an optional tar name too?

        Show
        Tom White added a comment - Unfortunately WHIRR-225 broke this patch completely, so I've generated an equivalent. The nice thing is that we only need one copy of the install_tarball function with the WHIRR-225 approach. I've tested with ZooKeeper, but haven't done HBase yet, since it has some different semantics for resolving the tar name from the URL (this was a problem with the original patch too). Can we generalize the function to take an optional tar name too?
        Hide
        Andrei Savu added a comment -

        I've updated the patch and it should work for all the services (only tested with hadoop and zookeeper). Let me know if it works for you.

        Show
        Andrei Savu added a comment - I've updated the patch and it should work for all the services (only tested with hadoop and zookeeper). Let me know if it works for you.
        Hide
        Andrei Savu added a comment -

        I've tested the install_tar function on the development machine by shutting down and restarting the connection. I haven't run the integration tests yet. I'm planning to do that tomorrow.

        Show
        Andrei Savu added a comment - I've tested the install_tar function on the development machine by shutting down and restarting the connection. I haven't run the integration tests yet. I'm planning to do that tomorrow.
        Hide
        Andrei Savu added a comment -

        From the set manpage:

        -e   errexit
                  Exit immediately if a simple command exits with a non-zero
                  status, unless the command that fails is part of an until or
                  while loop, part of an if statement, part of a && or || list,
                  or if the command's return status is being inverted using !. 
        

        I will replace the for loop with a while and add a wait before a retry
        as Lars suggested and move everything inside a function that can be reused.

        Show
        Andrei Savu added a comment - From the set manpage: -e errexit Exit immediately if a simple command exits with a non-zero status, unless the command that fails is part of an until or while loop, part of an if statement, part of a && or || list, or if the command's return status is being inverted using !. I will replace the for loop with a while and add a wait before a retry as Lars suggested and move everything inside a function that can be reused.
        Hide
        Andrei Savu added a comment - - edited

        We are seeing this failure because we do at the beginning of script set -e and we don't handle the curl exit code (18). The loop and the retry are never executed in this failure scenario.

        Show
        Andrei Savu added a comment - - edited We are seeing this failure because we do at the beginning of script set -e and we don't handle the curl exit code (18). The loop and the retry are never executed in this failure scenario.
        Hide
        Lars George added a comment -

        It is actually curl, sorry for the wrong title (corrected). The scripts already have a retry:

          curl="curl --retry 3 --silent --show-error --fail"
          for i in `seq 1 3`;
          do
            $curl -O $hbase_tar_url
            $curl -O $hbase_tar_url.md5
            if md5sum -c $hbase_tar_md5_file; then
              break;
            else
              rm -f $hbase_tar_file $hbase_tar_md5_file
            fi
          done
        

        Are these errors that actually do the retry loop? And if so, should the be a

        if [ $i -gt 1 ]; then
            sleep 10;
        fi
        

        or some such to wait before the retry? When to give up entirely?

        Show
        Lars George added a comment - It is actually curl, sorry for the wrong title (corrected). The scripts already have a retry: curl= "curl --retry 3 --silent --show-error --fail" for i in `seq 1 3`; do $curl -O $hbase_tar_url $curl -O $hbase_tar_url.md5 if md5sum -c $hbase_tar_md5_file; then break ; else rm -f $hbase_tar_file $hbase_tar_md5_file fi done Are these errors that actually do the retry loop? And if so, should the be a if [ $i -gt 1 ]; then sleep 10; fi or some such to wait before the retry? When to give up entirely?
        Hide
        Lars George added a comment -

        +1, I suggest some bash function() that can be reused to download with retries.

        Show
        Lars George added a comment - +1, I suggest some bash function() that can be reused to download with retries.
        Hide
        Andrei Savu added a comment -

        The curl exit codes are well documented. We should check that and retry as needed.

        Show
        Andrei Savu added a comment - The curl exit codes are well documented. We should check that and retry as needed.

          People

          • Assignee:
            Andrei Savu
            Reporter:
            Lars George
          • Votes:
            2 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development