Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-1346

GroupByContainerCount.balance() should guard against null LocalityManager

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.1
    • Component/s: None
    • Labels:
      None

      Description

      While it's less likely after SAMZA-1334, we have seen cases of an NPE in embedded mode.

      org.apache.samza.SamzaException: Failed to run application
      	at org.apache.samza.runtime.LocalApplicationRunner.run(LocalApplicationRunner.java:136)
      	at com.linkedin.beam.runners.samza.runtime.fluent.FluentRuntime$RunnerTask.run(FluentRuntime.java:114)
      	... 1 more
      Caused by: java.lang.NullPointerException
      	at org.apache.samza.container.grouper.task.GroupByContainerCount.balance(GroupByContainerCount.java:92)
      	at org.apache.samza.coordinator.JobModelManager$.readJobModel(JobModelManager.scala:257)
      	at org.apache.samza.coordinator.JobModelManager.readJobModel(JobModelManager.scala)
      	at org.apache.samza.standalone.StandaloneJobCoordinator.<init>(StandaloneJobCoordinator.java:108)
      	at org.apache.samza.standalone.StandaloneJobCoordinatorFactory.getJobCoordinator(StandaloneJobCoordinatorFactory.java:29)
      	at org.apache.samza.processor.StreamProcessor.<init>(StreamProcessor.java:111)
      	at org.apache.samza.processor.StreamProcessor.<init>(StreamProcessor.java:94)
      	at org.apache.samza.runtime.LocalApplicationRunner.createStreamProcessor(LocalApplicationRunner.java:231)
      	at org.apache.samza.runtime.LocalApplicationRunner.lambda$run$0(LocalApplicationRunner.java:125)
      	at org.apache.samza.runtime.LocalApplicationRunner$$Lambda$35/1940982718.accept(Unknown Source)
      	at java.util.ArrayList.forEach(ArrayList.java:1249)
      	at org.apache.samza.runtime.LocalApplicationRunner.run(LocalApplicationRunner.java:121)
      	... 2 more
      

      It should be straight forward to defend against this case and provide better feedback in the logs. E.g. if the locality manager is null, then host affinity is not enabled and we could just defer to group().

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user jmakes opened a pull request:

          https://github.com/apache/samza/pull/232

          SAMZA-1346: GroupByContainerCount.balance() should guard against null…

          … LocalityManager

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/jmakes/samza samza-1346

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/samza/pull/232.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #232


          commit 10722fc777d9c229f90534dee272237782b9f31b
          Author: Jacob Maes <jmaes@linkedin.com>
          Date: 2017-06-26T16:27:48Z

          SAMZA-1346: GroupByContainerCount.balance() should guard against null LocalityManager


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user jmakes opened a pull request: https://github.com/apache/samza/pull/232 SAMZA-1346 : GroupByContainerCount.balance() should guard against null… … LocalityManager You can merge this pull request into a Git repository by running: $ git pull https://github.com/jmakes/samza samza-1346 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/samza/pull/232.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #232 commit 10722fc777d9c229f90534dee272237782b9f31b Author: Jacob Maes <jmaes@linkedin.com> Date: 2017-06-26T16:27:48Z SAMZA-1346 : GroupByContainerCount.balance() should guard against null LocalityManager
          Hide
          navina Navina Ramesh added a comment - - edited

          > if the locality manager is null, then host affinity is not enabled and we could just defer to group().

          Would it more straightforward for the grouper to check the job.host-affinity.enabled configuration? If you assume that localityManager definition implies host-affinity is enabled, it might hide potential issues in other components, where localityManager is defined/used, however, the user didn't intend to enable host-affinity. Not that host-affinity should have any adverse side-effect on the job execution itself. Let me know what you think. Thanks!

          Show
          navina Navina Ramesh added a comment - - edited > if the locality manager is null, then host affinity is not enabled and we could just defer to group(). Would it more straightforward for the grouper to check the job.host-affinity.enabled configuration? If you assume that localityManager definition implies host-affinity is enabled, it might hide potential issues in other components, where localityManager is defined/used, however, the user didn't intend to enable host-affinity. Not that host-affinity should have any adverse side-effect on the job execution itself. Let me know what you think. Thanks!
          Hide
          jmakes Jake Maes added a comment -

          Navina Ramesh SAMZA-1346 (mentioned in the description) adds this that check in JobModelManager, which has the config readily available.

          This change is more about proper encapsulation in GroupByContainerCount. It should validate all of its inputs as best as possible and not require a new input (like config) solely for the purpose of validation, if it can be avoided.

          Show
          jmakes Jake Maes added a comment - Navina Ramesh SAMZA-1346 (mentioned in the description) adds this that check in JobModelManager, which has the config readily available. This change is more about proper encapsulation in GroupByContainerCount. It should validate all of its inputs as best as possible and not require a new input (like config) solely for the purpose of validation, if it can be avoided.
          Hide
          navina Navina Ramesh added a comment -

          > It should validate all of its inputs as best as possible and not require a new input (like config) solely for the purpose of validation, if it can be avoided.

          Yes. We should not require new input solely for the purpose of validation. However, imo, in this case, we are validating a contract between the JobModelManager and the TaskNameGrouper. If (for some reason) there was a change in JobModelManager that resulted in breaking this contract, the grouper can become more defensive by saying that host-affinity is enabled and it cannot operate without locality manager.
          I am not very picky about this, but I just want to make sure this line of approach was considered. Thanks!

          Show
          navina Navina Ramesh added a comment - > It should validate all of its inputs as best as possible and not require a new input (like config) solely for the purpose of validation, if it can be avoided. Yes. We should not require new input solely for the purpose of validation. However, imo, in this case, we are validating a contract between the JobModelManager and the TaskNameGrouper . If (for some reason) there was a change in JobModelManager that resulted in breaking this contract, the grouper can become more defensive by saying that host-affinity is enabled and it cannot operate without locality manager. I am not very picky about this, but I just want to make sure this line of approach was considered. Thanks!
          Hide
          jmakes Jake Maes added a comment -

          Appreciate the feedback.
          >If (for some reason) there was a change in JobModelManager that resulted in breaking this contract, the grouper can become more defensive by saying that host-affinity is enabled and it cannot operate without locality manager.

          That's why this check was added. In case JobModelManager changes. The only difference is that I think this check is the core check that must be performed. If there's no locality manager, the balance method cannot work, as it depends on the persistence provided by the locality manager. We could add a secondary check against the config, but I don't think that check would be self sufficient because it also depends on a contract that could change: host affinity vs null locality manager.

          Show
          jmakes Jake Maes added a comment - Appreciate the feedback. >If (for some reason) there was a change in JobModelManager that resulted in breaking this contract, the grouper can become more defensive by saying that host-affinity is enabled and it cannot operate without locality manager. That's why this check was added. In case JobModelManager changes. The only difference is that I think this check is the core check that must be performed. If there's no locality manager, the balance method cannot work, as it depends on the persistence provided by the locality manager. We could add a secondary check against the config, but I don't think that check would be self sufficient because it also depends on a contract that could change: host affinity vs null locality manager.
          Hide
          jmakes Jake Maes added a comment -

          Issue resolved by pull request 232
          https://github.com/apache/samza/pull/232

          Show
          jmakes Jake Maes added a comment - Issue resolved by pull request 232 https://github.com/apache/samza/pull/232
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/samza/pull/232

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/samza/pull/232

            People

            • Assignee:
              jmakes Jake Maes
              Reporter:
              jmakes Jake Maes
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development