Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-390

SIGSEGV if parent queue does not exist for tag rule

    XMLWordPrintableJSON

Details

    Description

      The scheduler has crashed if the parent specified for the tag placement rule is not existing.
      The bug is in this line (core/pkg/scheduler/placement/tag_rule.go#93)

      if info.GetQueue(parentName).IsLeafQueue() {
        return "", fmt.Errorf("parent rule returned a leaf queue: %s", parentName)
      }
      

      info.GetQueue(parentName) returns nil, which causes the crash.
      Full stack trace:

      panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x198b707]
      
      goroutine 116 [running]:
      github.com/apache/incubator-yunikorn-core/pkg/cache.(*QueueInfo).IsLeafQueue(...)
      	/Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/cache/queue_info.go:198
      github.com/apache/incubator-yunikorn-core/pkg/scheduler/placement.(*tagRule).placeApplication(0xc005d50050, 0xc000494100, 0xc0006bc210, 0xc00644a300, 0x2, 0x2, 0x10502b1)
      	/Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/placement/tag_rule.go:93 +0xb47
      github.com/apache/incubator-yunikorn-core/pkg/scheduler/placement.(*AppPlacementManager).PlaceApplication(0xc005d50000, 0xc000494100, 0x0, 0x0)
      	/Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/placement/placement.go:141 +0x485
      github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*partitionSchedulingContext).addSchedulingApplication(0xc0002e20e0, 0xc005b36120, 0x0, 0x0)
      	/Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduling_partition.go:108 +0x892
      github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*ClusterSchedulingContext).addSchedulingApplication(0xc000012000, 0xc005b36120, 0x0, 0x0)
      	/Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduling_context.go:114 +0x1d5
      github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).addNewApplication(0xc000390000, 0xc000494100, 0xc000738121, 0x9)
      	/Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduler.go:209 +0x277
      github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).processApplicationUpdateEvent(0xc000390000, 0xc00a7541e0)
      	/Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduler.go:447 +0x9ec
      github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).handleSchedulerEvent(0xc000390000)
      	/Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduler.go:596 +0x40a
      created by github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).StartService
      	/Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduler.go:67 +0x9e
      

      I also attach the placement rule, but note that I was working on YUNIKORN-368, so the code is not 100% the same:

      partitions:
        - name: default
          placementrules:
            - name: tag
              value: namespace
              create: true
              parent:
                name: tag
                value: "namespace.parentqueue"
                create: true
          queues:
            - name: root
              submitacl: '*'
              queues:
                - name: default
                  submitacl: '*'
      

      where the namespace.parentqueue is set to "root.special".

      My proposal is that even if the queue does not exist, it shouldn't crash. Let's make a double check before doing getting the QueueInfo object.

      Attachments

        Issue Links

          Activity

            People

              wilfreds Wilfred Spiegelenburg
              adam.antal Adam Antal
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: