Uploaded image for project: 'Sling'
  1. Sling
  2. SLING-10372

OSGi Installer: NPE during package installation during startup

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • Starter 12
    • Installer
    • None
    •  Sling-Starter 12-SNAPSHOT (commit 0e6a8e41) with JDK 11 on MacOS

    Description

      (As requested in SLING-10362) When starting a Sling Starter 12 with a feature archive containing a couple of packages and having a couple of packages installed with the Sling fileinstaller provider, I often get a NPE, stacktrace is appended. This stops the installation of the package when this happens. It isn't about that particular package, though - if I take out other packages from the automatic installation or put it into the fileinstall directory it later, it installs happily.

      It's a rather difficult to give detailed steps to reproduce that, but I have guess what's happening. I do have a particular setting where it always happens on my machine, but that might be sensitive to the speed of my machine and whatnot. Basically, I'm starting the feature launcher with a FAR containg several packages of ours, and also give the arguments

      -Dsling.fileinstall.dir=launcher/fileinstall -Dfelix.startlevel.bundle=30

      to the launcher, having placed several packages in the fileinstall directory. I guess the NPE happens only when enough packages are placed there, and it happens only on the first startup (i.e., there was no launcher directory yet).

      I had a look around with the debugger: it seems the SlingRepository was stopped but not yet started again for a restart just before the PackageTransformer is trying to process the package, probably due to some kind of configuration change. It tries to access the repository via a reference of the OakSlingRepository whose manager already has been stopped so that getRepository() returns null. Hence the NPE. Probably the org.apache.sling.installer.factory.packages.impl.PackageTransformer should somehow handle such temporary failures that don't have anything to do with the package? Another way to solve seems to be to set the start level of the org.apache.sling.installer.factory.packages bundle to 21. Probably because when reaching the start level 20 so much happens at once, so that transition is not a good time to install packages.

      Here is the stacktrace that marks the error. I'll attach a logfile for some more context. BTW: Interesting might be also the exceptions "Can't create child on a synthetic root" in the log file, which I receive regularily during startup, but that's probably not related to this problem, as it also happens when things work properly.

      11.05.2021 13:27:49.462 *ERROR* [OsgiInstallerImpl] org.apache.sling.installer.factory.packages.impl.PackageTransformer Error while processing install content package task* of TaskResource(url=fileinstallff43091e0ee8ac91416c79636bdce5f4:/Users/hps/dev/composum/composum-launch/feature/composumstarter/target/launcher/fileinstall/99/composum-site-app-package-1.0.0-SNAPSHOT.zip, entity=content-package:tenants/ist:composum-site-app-package, state=INSTALL, attributes=[org.apache.sling.installer.api.tasks.ResourceTransformer=:27:23:1243:, package-id=tenants/ist:composum-site-app-package:1.0.0-SNAPSHOT, Bundle-Version=1.0.0.SNAPSHOT], digest=1620718306467) due to null, no retry.
       java.lang.NullPointerException: null
               at org.apache.sling.jcr.oak.server.internal.OakSlingRepository$2.run(OakSlingRepository.java:99) [org.apache.sling.jcr.oak.server:1.2.10]
               at org.apache.sling.jcr.oak.server.internal.OakSlingRepository$2.run(OakSlingRepository.java:96) [org.apache.sling.jcr.oak.server:1.2.10]
               at java.base/java.security.AccessController.doPrivileged(Native Method)
               at java.base/javax.security.auth.Subject.doAsPrivileged(Subject.java:550)
               at org.apache.sling.jcr.oak.server.internal.OakSlingRepository.createServiceSession(OakSlingRepository.java:96) [org.apache.sling.jcr.oak.server:1.2.10]
               at org.apache.sling.jcr.base.AbstractSlingRepository2.createServiceSession(AbstractSlingRepository2.java:166) [org.apache.sling.jcr.base:3.1.6]
               at org.apache.sling.jcr.base.AbstractSlingRepository2.loginService(AbstractSlingRepository2.java:383) [org.apache.sling.jcr.base:3.1.6]
               at org.apache.sling.installer.factory.packages.impl.PackageTransformer$AbstractPackageInstallTask.execute(PackageTransformer.java:263) [org.apache.sling.installer. factory.packages:1.0.4]
               at org.apache.sling.installer.core.impl.OsgiInstallerImpl.doExecuteTasks(OsgiInstallerImpl.java:918) [org.apache.sling.installer.core:3.11.4]
               at org.apache.sling.installer.core.impl.OsgiInstallerImpl.executeTasks(OsgiInstallerImpl.java:755) [org.apache.sling.installer.core:3.11.4]
               at org.apache.sling.installer.core.impl.OsgiInstallerImpl.run(OsgiInstallerImpl.java:304) [org.apache.sling.installer.core:3.11.4]
               at java.base/java.lang.Thread.run(Thread.java:834)
      

      I'm not sure whether this is a a Minor or Major - it breaks things in the startup, but I've found a way to modify the starter to avoid it, see above.

      Attachments

        1. error.log
          221 kB
          Hans-Peter Stoerr

        Activity

          People

            Unassigned Unassigned
            hanspeterstoerr Hans-Peter Stoerr
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: