Hadoop Common
  1. Hadoop Common
  2. HADOOP-1198

ipc.client.timeout of 2000ms for test cases seems too small; causes too many timeouts and leads to hung test cases

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.2
    • Fix Version/s: 0.13.0
    • Component/s: test
    • Labels:
      None

      Description

      We should increase the timeout slightly... what do other think? 5000ms? 10000ms?

      1. HADOOP-1198_20070403_1.patch
        0.4 kB
        Arun C Murthy
      2. HADOOP-1198_20070407_2.patch
        0.5 kB
        Arun C Murthy

        Activity

        Hide
        Arun C Murthy added a comment -

        Increasing it to 5000ms... thoughts?

        Show
        Arun C Murthy added a comment - Increasing it to 5000ms... thoughts?
        Hide
        Raghu Angadi added a comment -

        Just curious, why is 5 seconds ok? Is this based on some expected server behavior?

        Show
        Raghu Angadi added a comment - Just curious, why is 5 seconds ok? Is this based on some expected server behavior?
        Hide
        Doug Cutting added a comment -

        It was 1 second for many months, until I increased it to 2 seconds in HADOOP-1030, a bit over a month ago, so that things would run better on the heavily loaded Solaris box where we run nightly benchmarks.

        How often do you see timeouts with

        {1,2,5,10}

        seconds? On what kind of machine?

        Related is, how much difference does changing this have on total unit test time? It used to make a big difference, but now that many daemons are stopped with Thread.interrupt() instead of waiting for the timeout interval, total unit test time may not be so sensitive to this parameter. If it's not, then we might even just leave it at the default.

        Show
        Doug Cutting added a comment - It was 1 second for many months, until I increased it to 2 seconds in HADOOP-1030 , a bit over a month ago, so that things would run better on the heavily loaded Solaris box where we run nightly benchmarks. How often do you see timeouts with {1,2,5,10} seconds? On what kind of machine? Related is, how much difference does changing this have on total unit test time? It used to make a big difference, but now that many daemons are stopped with Thread.interrupt() instead of waiting for the timeout interval, total unit test time may not be so sensitive to this parameter. If it's not, then we might even just leave it at the default.
        Hide
        Arun C Murthy added a comment -

        On my gentoo (2.6.16.11) with 1GB of ram I see too many timeouts with 2s. None with 5s. Total test time on my box: ~22mins.

        Nigel seems to see timeouts on Solaris with 2s too...

        I'm ok ignoring this if others feel otherwise.

        Show
        Arun C Murthy added a comment - On my gentoo (2.6.16.11) with 1GB of ram I see too many timeouts with 2s. None with 5s. Total test time on my box: ~22mins. Nigel seems to see timeouts on Solaris with 2s too... I'm ok ignoring this if others feel otherwise.
        Hide
        Raghu Angadi added a comment -

        5 sec is obviously an improvement I think. Doug answered my concern.. it might delay test times and if it does not increase the test times, it could be much higher. so that we don't need to readjust again.

        Show
        Raghu Angadi added a comment - 5 sec is obviously an improvement I think. Doug answered my concern.. it might delay test times and if it does not increase the test times, it could be much higher. so that we don't need to readjust again.
        Hide
        Doug Cutting added a comment -

        > Total test time on my box: ~22mins.

        That's with a 5 second timeout? How about with a 30 second timeout? The only reason to use a lower-than-default timeout for unit tests is to make them run faster. If it is not serving that purpose, then let's remove this from the test config altogether.

        Show
        Doug Cutting added a comment - > Total test time on my box: ~22mins. That's with a 5 second timeout? How about with a 30 second timeout? The only reason to use a lower-than-default timeout for unit tests is to make them run faster. If it is not serving that purpose, then let's remove this from the test config altogether.
        Hide
        Arun C Murthy added a comment -

        Yes, that was with a 5s timeout.

        I ran without ipc.client.timeout in src/test/hadoop-site.xml i.e. default of 60s and the total time was 23mins.

        I vote we remove this altogether, can someone else corroborate? Thanks!

        Show
        Arun C Murthy added a comment - Yes, that was with a 5s timeout. I ran without ipc.client.timeout in src/test/hadoop-site.xml i.e. default of 60s and the total time was 23mins. I vote we remove this altogether, can someone else corroborate? Thanks!
        Hide
        Raghu Angadi added a comment -

        In my test time went from 12 min with 2 sec to 13 min with 60 sec time out. May be there is just one test where this affects.

        Show
        Raghu Angadi added a comment - In my test time went from 12 min with 2 sec to 13 min with 60 sec time out. May be there is just one test where this affects.
        Hide
        Arun C Murthy added a comment -

        If others are comfortable with this... here is a patch which removes the ipc.client.timeout parameter from src/test/hadoop-site.xml.

        Show
        Arun C Murthy added a comment - If others are comfortable with this... here is a patch which removes the ipc.client.timeout parameter from src/test/hadoop-site.xml.
        Hide
        Hadoop QA added a comment -
        Show
        Hadoop QA added a comment - +1 http://issues.apache.org/jira/secure/attachment/12355117/HADOOP-1198_20070407_2.patch applied and successfully tested against trunk revision r526411. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
        Hide
        Doug Cutting added a comment -

        +1 This looks like a good change to me.

        Show
        Doug Cutting added a comment - +1 This looks like a good change to me.
        Hide
        Tom White added a comment -

        I've just committed this. Thanks Arun!

        Show
        Tom White added a comment - I've just committed this. Thanks Arun!
        Hide
        Jim Kellerman added a comment -

        Thank you for this patch!

        I have a MacBook Pro that dual boots Mac OS X and Fedora Core 6 and have been trying to figure out why this one particular test I was running was always successful under Mac OS but always had timeouts under Linux.

        Show
        Jim Kellerman added a comment - Thank you for this patch! I have a MacBook Pro that dual boots Mac OS X and Fedora Core 6 and have been trying to figure out why this one particular test I was running was always successful under Mac OS but always had timeouts under Linux.
        Hide
        Hadoop QA added a comment -
        Show
        Hadoop QA added a comment - Integrated in Hadoop-Nightly #54 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/54/ )

          People

          • Assignee:
            Arun C Murthy
            Reporter:
            Arun C Murthy
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development