Uploaded image for project: 'Infrastructure'
  1. Infrastructure
  2. INFRA-4685

Open file handles problem on lucene.zones.apache.org

    Details

      Description

      Hi. We've been running into errors caused by exceeding the number of file handles recently on jenkins (lucene.zones.apache.org). From what we can tell:
      {noformat}
      kern.maxfiles: 12328
      kern.maxfilesperproc: 11095
      kern.openfiles: 11601
      {noformat}

      so it's indeed very close to the system maxfiles. We have tried to pinpoint the problem but we failed (Robert Muir, Dawid Weiss). Our suspicion is that something else (a different jail?) is sucking all the resources out but it's hard to tell, really.

      Quesitons/ wishes:
      - could you take a look at the machine's jails -- is maxfiles shared between them or is it per-jail limit? If it's per-jail then what is eating away so many file handles on lucene.*?
      - would it be possible to bump maxfiles (if the cause of the problem remains unknown) so that we can proceed with regular builds?

      Any other suggestion you may provide to resolve the problem will be very welcome. We lsof'ed, sockstated and everything else. Solr/Lucene builds are pretty heavy but the above situation is present even when no build takes place.

        Activity

        Hide
        infrabot #asfinfra Bot added a comment -
        <danielsh> A process on one of the other jails has 8k file handles open. Following up via email
        Show
        infrabot #asfinfra Bot added a comment - <danielsh> A process on one of the other jails has 8k file handles open. Following up via email
        Hide
        dweiss Dawid Weiss added a comment - Reporter
        Thanks guys.
        Show
        dweiss Dawid Weiss added a comment - Reporter Thanks guys.
        Hide
        infrabot #asfinfra Bot added a comment -
        <danielsh> The offending jail's admins are looking into the problem and have taken steps to prevent its recurrence.
        Show
        infrabot #asfinfra Bot added a comment - <danielsh> The offending jail's admins are looking into the problem and have taken steps to prevent its recurrence.

          People

          • Assignee:
            Unassigned
            Reporter:
            dweiss Dawid Weiss
            Request participants:
            None
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: