Infrastructure
  1. Infrastructure
  2. INFRA-4685

Open file handles problem on lucene.zones.apache.org

    Details

      Description

      Hi. We've been running into errors caused by exceeding the number of file handles recently on jenkins (lucene.zones.apache.org). From what we can tell:
      {noformat}
      kern.maxfiles: 12328
      kern.maxfilesperproc: 11095
      kern.openfiles: 11601
      {noformat}

      so it's indeed very close to the system maxfiles. We have tried to pinpoint the problem but we failed (Robert Muir, Dawid Weiss). Our suspicion is that something else (a different jail?) is sucking all the resources out but it's hard to tell, really.

      Quesitons/ wishes:
      - could you take a look at the machine's jails -- is maxfiles shared between them or is it per-jail limit? If it's per-jail then what is eating away so many file handles on lucene.*?
      - would it be possible to bump maxfiles (if the cause of the problem remains unknown) so that we can proceed with regular builds?

      Any other suggestion you may provide to resolve the problem will be very welcome. We lsof'ed, sockstated and everything else. Solr/Lucene builds are pretty heavy but the above situation is present even when no build takes place.

        Activity

        Dawid Weiss created issue -
        Hide
        #asfinfra IRC Bot added a comment -
        <danielsh> A process on one of the other jails has 8k file handles open. Following up via email
        Show
        #asfinfra IRC Bot added a comment - <danielsh> A process on one of the other jails has 8k file handles open. Following up via email
        Hide
        Dawid Weiss added a comment -
        Thanks guys.
        Show
        Dawid Weiss added a comment - Thanks guys.
        Hide
        #asfinfra IRC Bot added a comment -
        <danielsh> The offending jail's admins are looking into the problem and have taken steps to prevent its recurrence.
        Show
        #asfinfra IRC Bot added a comment - <danielsh> The offending jail's admins are looking into the problem and have taken steps to prevent its recurrence.
        #asfinfra IRC Bot made changes -
        Field Original Value New Value
        Status Open [ 1 ] Closed [ 6 ]
        Resolution Fixed [ 1 ]
        Tony Stevenson made changes -
        Workflow jira [ 12662641 ] INFRA Workflow [ 12711502 ]
        Gavin made changes -
        Fix Version/s Initial Clearing [ 12325964 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Dawid Weiss
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development