Uploaded image for project: 'Infrastructure'
  1. Infrastructure
  2. INFRA-15373

Various Jenkins bits are in trouble

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Fix Version/s: Nov 2017
    • Component/s: Jenkins
    • Labels:
      None
    • Project:
      Hadoop

      Description


      It looks like Jenkins is no longer scheduling jobs. I attempted to restart the Jenkins agent via the UI on H4 (see below) and that's when things appears to have stopped getting scheduled.

      Also, nodes H1, H2, H10, and H12 need to be kicked.


      Additionally, I think I've discovered that the Hadoop HDFS unit tests for branch-2 are causing havoc on build nodes. ( tracking in HDFS-12711) I'm at 50% condfidence the problem is OOM-killer related. In majority cases, the nodes become unavailable entirely from Jenkins. Today, H4 reported back data from inside the container which meant that it wasn't a kernel panic. So I did a restart of the Jenkins agent but it still never fully came back from what I can tell.

      In any case, I'm still trying to reproduce the problem locally but it's tough going. I'm going to hard set which nodes certain tests run on to try and limit the damage though. Additionally, I've been working on YETUS-561 in case it is OOM related. From experiments, that seems to work when OOM actually is the problem: unit tests and the like are sufficiently sacrificed.

      Anyway, sorry for the issues and thanks for the help.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                pono Daniel Takamori
                Reporter:
                aw Allen Wittenauer
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: