Felix
  1. Felix
  2. FELIX-3160

NPE in BundleRevisionImpl.close() when uninstalling a bundle

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: framework-4.0.0
    • Fix Version/s: framework-4.2.0
    • Component/s: Framework
    • Labels:
      None

      Description

      There seems to be a case where m_content can be null when uninstalling a bundle, causing the following code to throw an NPE:

      synchronized void close()
      {
      try

      { resolve(null); }

      catch (Exception ex)

      { ((BundleImpl) m_bundle).getFramework().getLogger().log( Logger.LOG_ERROR, "Error releasing revision: " + ex.getMessage(), ex); }

      m_content.close();

      I've added a simple check for null around the close in my version to avoid the exception, which I'll check in.

      I'm not sure of the scenarios where this can legally be null at this point, and whether there's some nastier underlying circumstance that needs investigating. We see it under the following circumstances:

      • we register a listener that looks for framework STARTED
      • when triggered, this iterates all bundles and performs an uninstall on ones which we determine are zombies leftover from a previous run

      Aside from the fact we're calling uninstall from inside an event handling method, there doesn't anything else unusual about this code.

      1. Felix3160.zip
        23 kB
        Rob Walker
      2. Activator.java
        9 kB
        Rob Walker

        Activity

        Hide
        Rob Walker added a comment -

        Fix committed to test for null pointer.
        Will close once reviewed and agreed that no further/deeper checks are needed.

        Show
        Rob Walker added a comment - Fix committed to test for null pointer. Will close once reviewed and agreed that no further/deeper checks are needed.
        Hide
        Richard S. Hall added a comment -

        This doesn't sound right. Do you have something to help me reproduce the scenario? If not, are you saying it will always happen if I create a bundle that uninstalls another bundle after receiving the STARTED event?

        Show
        Richard S. Hall added a comment - This doesn't sound right. Do you have something to help me reproduce the scenario? If not, are you saying it will always happen if I create a bundle that uninstalls another bundle after receiving the STARTED event?
        Hide
        Rob Walker added a comment -

        Certainly always happened our side.

        Here's the code in our activator:

        public void start(BundleContext context)
        throws
        Exception
        {
        this.context = context;
        ...
        // Only run sweeper once fully started
        this.context.addFrameworkListener(new FrameworkListener()
        {
        public void frameworkEvent(FrameworkEvent evt)
        {
        if (evt.getType() == FrameworkEvent.STARTED)

        { runSweeper(); }

        else if (evt.getType() == FrameworkEvent.ERROR)

        { diag.debug("Framework error, bundle: " + evt.getBundle(), evt.getThrowable()); }

        }
        });
        }

        We added the diag.debug when we started getting framework errors - that's what nailed the source of the NPE

        The runSweeper method is very simple. This is a framework restart - so we are warm starting using previously cached and started bundles. The sweeper looks for our transient bundles and cleans them out. They used START_TRANSIENT in the last run of the framework, this time around they only reach an INSTALLED state on startup.

        protected void runSweeper()
        {
        // TODO Auto-generated method stub
        for (Bundle bundle : this.context.getBundles())
        {
        try
        {
        if (bundle.getSymbolicName().startsWith(VtLibrary.TRANSIENT_PREFIX) &&
        bundle.getState() == Bundle.INSTALLED)

        { diag.debug("Removing transient bundle: " + bundle.toString()); bundle.uninstall(); }

        }
        catch (BundleException be)

        { app.error("Watchdog: sweeper error removing transient bundle: " + bundle.getSymbolicName()); }

        }
        }

        Not really sure what the trigger for m_content being null at uninstall time - whether it's to do with the uninstall() being called from inside the event handle, or whether some facet of the framework restart has left m_content null.

        Show
        Rob Walker added a comment - Certainly always happened our side. Here's the code in our activator: public void start(BundleContext context) throws Exception { this.context = context; ... // Only run sweeper once fully started this.context.addFrameworkListener(new FrameworkListener() { public void frameworkEvent(FrameworkEvent evt) { if (evt.getType() == FrameworkEvent.STARTED) { runSweeper(); } else if (evt.getType() == FrameworkEvent.ERROR) { diag.debug("Framework error, bundle: " + evt.getBundle(), evt.getThrowable()); } } }); } We added the diag.debug when we started getting framework errors - that's what nailed the source of the NPE The runSweeper method is very simple. This is a framework restart - so we are warm starting using previously cached and started bundles. The sweeper looks for our transient bundles and cleans them out. They used START_TRANSIENT in the last run of the framework, this time around they only reach an INSTALLED state on startup. protected void runSweeper() { // TODO Auto-generated method stub for (Bundle bundle : this.context.getBundles()) { try { if (bundle.getSymbolicName().startsWith(VtLibrary.TRANSIENT_PREFIX) && bundle.getState() == Bundle.INSTALLED) { diag.debug("Removing transient bundle: " + bundle.toString()); bundle.uninstall(); } } catch (BundleException be) { app.error("Watchdog: sweeper error removing transient bundle: " + bundle.getSymbolicName()); } } } Not really sure what the trigger for m_content being null at uninstall time - whether it's to do with the uninstall() being called from inside the event handle, or whether some facet of the framework restart has left m_content null.
        Hide
        Richard S. Hall added a comment -

        I created a bundle like this:

        public class Activator implements BundleActivator
        {
        public void start(final BundleContext bc)
        {
        bc.addFrameworkListener(new FrameworkListener() {
        public void frameworkEvent(FrameworkEvent event)
        {
        if (event.getType() == FrameworkEvent.STARTED)
        {
        for (Bundle b : bc.getBundles())
        {
        if ((b.getBundleId() != 0)
        && (b != bc.getBundle())
        && !b.getSymbolicName().startsWith("org.apache.felix.gogo"))
        {
        try

        { b.uninstall(); }

        catch (Exception ex)

        { System.out.println("+++ Uninstall error: " + ex); }

        }
        }
        }
        }
        });
        }

        public void stop(BundleContext bc)
        {
        }
        }

        This was working for me, it was correctly deleting any non-shell bundles on framework restart. Perhaps there is something else I need to do to recreate. Perhaps you could see if my code above gives you an error too. Thanks.

        Show
        Richard S. Hall added a comment - I created a bundle like this: public class Activator implements BundleActivator { public void start(final BundleContext bc) { bc.addFrameworkListener(new FrameworkListener() { public void frameworkEvent(FrameworkEvent event) { if (event.getType() == FrameworkEvent.STARTED) { for (Bundle b : bc.getBundles()) { if ((b.getBundleId() != 0) && (b != bc.getBundle()) && !b.getSymbolicName().startsWith("org.apache.felix.gogo")) { try { b.uninstall(); } catch (Exception ex) { System.out.println("+++ Uninstall error: " + ex); } } } } } }); } public void stop(BundleContext bc) { } } This was working for me, it was correctly deleting any non-shell bundles on framework restart. Perhaps there is something else I need to do to recreate. Perhaps you could see if my code above gives you an error too. Thanks.
        Hide
        Rob Walker added a comment -

        Sounds like there must be some other obscure factor at work here. Will try and assess what else is different in our setup and see if I can recreate with your bundle above.

        Show
        Rob Walker added a comment - Sounds like there must be some other obscure factor at work here. Will try and assess what else is different in our setup and see if I can recreate with your bundle above.
        Hide
        Karl Pauls added a comment -

        Richard, did you test with the current trunk? Rob commited a quick fix to it so if you are trying to reproduce you'd need to test against a revision before his commit ...

        Show
        Karl Pauls added a comment - Richard, did you test with the current trunk? Rob commited a quick fix to it so if you are trying to reproduce you'd need to test against a revision before his commit ...
        Hide
        Rob Walker added a comment - - edited

        Now that would be quite funny ... but I have an odd sense of humour

        There are 2 aspects I wonder if could be different in our setup.

        • Our sweeper bundle runs at a start level of 2. The bundles being uninstalled have a start level of 1, but, they used START_TRANSIENT in the previous framework start. On the restart, they never get beyond a start state of INSTALLED due do the transient start option.
        • The original JAR from which these bundles were started was a temporary file and will not be accessible on restarted (it's been deleted). The cached JAR and settings are of course still present. I don't recall any OSGi rule that says the original location/file that a bundle was installed from has to always be available, but maybe I forgot this?
        • Rob
        Show
        Rob Walker added a comment - - edited Now that would be quite funny ... but I have an odd sense of humour There are 2 aspects I wonder if could be different in our setup. Our sweeper bundle runs at a start level of 2. The bundles being uninstalled have a start level of 1, but, they used START_TRANSIENT in the previous framework start. On the restart, they never get beyond a start state of INSTALLED due do the transient start option. The original JAR from which these bundles were started was a temporary file and will not be accessible on restarted (it's been deleted). The cached JAR and settings are of course still present. I don't recall any OSGi rule that says the original location/file that a bundle was installed from has to always be available, but maybe I forgot this? Rob
        Hide
        Richard S. Hall added a comment -

        Yeah, I think Karl is correct. I'll look into it again later, thanks.

        Show
        Richard S. Hall added a comment - Yeah, I think Karl is correct. I'll look into it again later, thanks.
        Hide
        Richard S. Hall added a comment -

        I'm still not able to recreate, so I guess we'll have to try to recreate the exact scenario. If you can come up with something, let me know. Otherwise, I'll tinker around with it later and see if I can come up with something.

        Show
        Richard S. Hall added a comment - I'm still not able to recreate, so I guess we'll have to try to recreate the exact scenario. If you can come up with something, let me know. Otherwise, I'll tinker around with it later and see if I can come up with something.
        Hide
        Rob Walker added a comment -

        Might take me a couple of days to get to it but will see if I can figure what sequence of start, restart and uninstall are needed to trigger this every time.

        Show
        Rob Walker added a comment - Might take me a couple of days to get to it but will see if I can figure what sequence of start, restart and uninstall are needed to trigger this every time.
        Hide
        Richard S. Hall added a comment -

        Rob, I'm still waiting on you to help me recreate this situation...

        Show
        Richard S. Hall added a comment - Rob, I'm still waiting on you to help me recreate this situation...
        Hide
        Rob Walker added a comment -

        Yep, apols, got caught up with a ton of work on our next release so hadn't managed to get an isolated environment that recreates this yet. Still on my todo list, will update when I have.

        Show
        Rob Walker added a comment - Yep, apols, got caught up with a ton of work on our next release so hadn't managed to get an isolated environment that recreates this yet. Still on my todo list, will update when I have.
        Hide
        Rob Walker added a comment -

        Struggling to recreate in a simplified form. I don't think it's as simple as just the uninstall inside the handler for the framework STARTED event.

        I think it has something related to package admin - looking at the exception trace again, the log message I have after the uninstall has already been output. The Null pointer exception looks to happen asynchronously soon after:

        ===
        11-11-11 10:35:39 DEBUG - Removing transient bundle: com.ascert.transient.vt.lib.lib_4047819872618789.jar [36]
        11-11-11 10:35:39 DEBUG - Framework error, bundle: com.ascert.transient.vt.lib.lib_4047819872618789.jar [36]
        java.lang.NullPointerException
        at org.apache.felix.framework.BundleRevisionImpl.close(BundleRevisionImpl.java:642)
        at org.apache.felix.framework.BundleImpl.closeRevisions(BundleImpl.java:154)
        at org.apache.felix.framework.BundleImpl.closeAndDelete(BundleImpl.java:138)
        at org.apache.felix.framework.Felix$RefreshHelper.refreshOrRemove(Felix.java:4640)
        at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3952)
        at org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172)
        at java.lang.Thread.run(Thread.java:662)
        ===

        My cut down example still has the same package refresh code in, but as yet isn't tripping the issue.

        Will continue trying to recreate

        Show
        Rob Walker added a comment - Struggling to recreate in a simplified form. I don't think it's as simple as just the uninstall inside the handler for the framework STARTED event. I think it has something related to package admin - looking at the exception trace again, the log message I have after the uninstall has already been output. The Null pointer exception looks to happen asynchronously soon after: === 11-11-11 10:35:39 DEBUG - Removing transient bundle: com.ascert.transient.vt.lib.lib_4047819872618789.jar [36] 11-11-11 10:35:39 DEBUG - Framework error, bundle: com.ascert.transient.vt.lib.lib_4047819872618789.jar [36] java.lang.NullPointerException at org.apache.felix.framework.BundleRevisionImpl.close(BundleRevisionImpl.java:642) at org.apache.felix.framework.BundleImpl.closeRevisions(BundleImpl.java:154) at org.apache.felix.framework.BundleImpl.closeAndDelete(BundleImpl.java:138) at org.apache.felix.framework.Felix$RefreshHelper.refreshOrRemove(Felix.java:4640) at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3952) at org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172) at java.lang.Thread.run(Thread.java:662) === My cut down example still has the same package refresh code in, but as yet isn't tripping the issue. Will continue trying to recreate
        Hide
        Rob Walker added a comment -

        Will run out of time to look further into this one shortly. Summary at this stage:

        • the NPE happens every time in our own application setup
        • the trigger seems to be to do with the uninstall of bundles started as transient from previous runs of the framework
        • the exception itself seems to come out of a package refresh, which is triggered by the same bundle as we do the uninstall from
        • it doesn't happen on a simple setup with just a few bundles - our application takes 3 to 5s to startup, and has around 48 bundles - implying maybe timing or complexity of bundle inter wiring could be a factor
        Show
        Rob Walker added a comment - Will run out of time to look further into this one shortly. Summary at this stage: the NPE happens every time in our own application setup the trigger seems to be to do with the uninstall of bundles started as transient from previous runs of the framework the exception itself seems to come out of a package refresh, which is triggered by the same bundle as we do the uninstall from it doesn't happen on a simple setup with just a few bundles - our application takes 3 to 5s to startup, and has around 48 bundles - implying maybe timing or complexity of bundle inter wiring could be a factor
        Hide
        Rob Walker added a comment -

        Just as I was about to give up - I think I got it!

        Adding 1 more bundle to the startup set seemed to trigger it (http)

        This is the bundle set I load:

        ===
        felix.auto.install.1="$

        {bundle.root}/lib/ext/osgi.compendium.jar"

        org.osgi.framework.startlevel.beginning=3

        felix.auto.start.1="${bundle.root}

        /lib/ext/felix.log.jar"

        felix.auto.start.2="$

        {bundle.root}/lib/ext/httpjetty.jar" \
        "${bundle.root}

        /lib/vt/watchdog.jar"

        felix.auto.start.3="$

        {bundle.root}/lib/ext/shell.jar" \
        "${bundle.root}

        /lib/ext/shelltui.jar"

        1. we'll start this transient so we have something to clean up
          test.bundle=file:c:/mnt/data/vtmp/lib/shell.gui.jar
          ===

        All the above bundles are standard from felix SVN, except my watchdog bundle. Will attach source for that.

        Steps I take are:

        1 - start cold with the above bundle set

        2 - from shell TUI start the shell.gui.jar transient (my activator code looks for this symbolic name on restart) e.g.

        -> install file:c:/mnt/data/vtmp/lib/shell.gui.jar
        Bundle ID: 7
        -> start -t 7

        3 - kill the framework (I used Ctrl-C in fact, rather than clean shutdown)

        4 - start the framework warm this time. NPE occurs after the auto uninstall of the shell GUI jar:

        C:\mnt\Data\runvt>vtmp
        Removing transient bundle: org.apache.felix.shell.gui [7]
        -> Framework error, bundle: org.apache.felix.shell.gui [7]
        java.lang.NullPointerException
        at org.apache.felix.framework.BundleRevisionImpl.close(BundleRevisionImpl.java:642)
        at org.apache.felix.framework.BundleImpl.closeRevisions(BundleImpl.java:154)
        at org.apache.felix.framework.BundleImpl.closeAndDelete(BundleImpl.java:138)
        at org.apache.felix.framework.Felix$RefreshHelper.refreshOrRemove(Felix.java:4640)
        at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3952)
        at org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172)
        at java.lang.Thread.run(Thread.java:662)

        ===

        Remember to comment out the workaround checked in to SVN to ensure the NPE appears:

        synchronized void close()
        {
        try

        { resolve(null); }

        catch (Exception ex)

        { ((BundleImpl) m_bundle).getFramework().getLogger().log( Logger.LOG_ERROR, "Error releasing revision: " + ex.getMessage(), ex); }

        //if (m_content != null)
        //

        { m_content.close(); //}

        m_content = null;
        for (int i = 0; (m_contentPath != null) && (i < m_contentPath.size()); i++)

        { m_contentPath.get(i).close(); }

        m_contentPath = null;
        }

        Show
        Rob Walker added a comment - Just as I was about to give up - I think I got it! Adding 1 more bundle to the startup set seemed to trigger it (http) This is the bundle set I load: === felix.auto.install.1="$ {bundle.root}/lib/ext/osgi.compendium.jar" org.osgi.framework.startlevel.beginning=3 felix.auto.start.1="${bundle.root} /lib/ext/felix.log.jar" felix.auto.start.2="$ {bundle.root}/lib/ext/httpjetty.jar" \ "${bundle.root} /lib/vt/watchdog.jar" felix.auto.start.3="$ {bundle.root}/lib/ext/shell.jar" \ "${bundle.root} /lib/ext/shelltui.jar" we'll start this transient so we have something to clean up test.bundle= file:c:/mnt/data/vtmp/lib/shell.gui.jar === All the above bundles are standard from felix SVN, except my watchdog bundle. Will attach source for that. Steps I take are: 1 - start cold with the above bundle set 2 - from shell TUI start the shell.gui.jar transient (my activator code looks for this symbolic name on restart) e.g. -> install file:c:/mnt/data/vtmp/lib/shell.gui.jar Bundle ID: 7 -> start -t 7 3 - kill the framework (I used Ctrl-C in fact, rather than clean shutdown) 4 - start the framework warm this time. NPE occurs after the auto uninstall of the shell GUI jar: C:\mnt\Data\runvt>vtmp Removing transient bundle: org.apache.felix.shell.gui [7] -> Framework error, bundle: org.apache.felix.shell.gui [7] java.lang.NullPointerException at org.apache.felix.framework.BundleRevisionImpl.close(BundleRevisionImpl.java:642) at org.apache.felix.framework.BundleImpl.closeRevisions(BundleImpl.java:154) at org.apache.felix.framework.BundleImpl.closeAndDelete(BundleImpl.java:138) at org.apache.felix.framework.Felix$RefreshHelper.refreshOrRemove(Felix.java:4640) at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3952) at org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172) at java.lang.Thread.run(Thread.java:662) === Remember to comment out the workaround checked in to SVN to ensure the NPE appears: synchronized void close() { try { resolve(null); } catch (Exception ex) { ((BundleImpl) m_bundle).getFramework().getLogger().log( Logger.LOG_ERROR, "Error releasing revision: " + ex.getMessage(), ex); } //if (m_content != null) // { m_content.close(); //} m_content = null; for (int i = 0; (m_contentPath != null) && (i < m_contentPath.size()); i++) { m_contentPath.get(i).close(); } m_contentPath = null; }
        Hide
        Richard S. Hall added a comment -

        Well, I was hopeful. I recreated the scenario as faithfully as I could, but it still works properly for me (and I remembered to comment out the null check).

        Not sure what is going on. Maybe it is related to platform. To rule everything else out, could you create a self-contained zip of the exact scenario that can reproduce the situation for you and attach it here? Then I will try to use that exactly and see what I see.

        If that doesn't work, then we'll have to make sure we are testing on the same platform (I'm on a Mac using Java 6) and go from there.

        Show
        Richard S. Hall added a comment - Well, I was hopeful. I recreated the scenario as faithfully as I could, but it still works properly for me (and I remembered to comment out the null check). Not sure what is going on. Maybe it is related to platform. To rule everything else out, could you create a self-contained zip of the exact scenario that can reproduce the situation for you and attach it here? Then I will try to use that exactly and see what I see. If that doesn't work, then we'll have to make sure we are testing on the same platform (I'm on a Mac using Java 6) and go from there.
        Hide
        Rob Walker added a comment -

        That's a shame. I'll need to try and create a version my side that doesn't use our custom launcher, which is a bit too involved to make for a simple self contained example. Travelling with work the next week but will have a look when I get back.

        Show
        Rob Walker added a comment - That's a shame. I'll need to try and create a version my side that doesn't use our custom launcher, which is a bit too involved to make for a simple self contained example. Travelling with work the next week but will have a look when I get back.
        Hide
        Rob Walker added a comment -

        I should have added, we are Windows 7 and also Java 6, so I doubt there should be too many differences.

        Show
        Rob Walker added a comment - I should have added, we are Windows 7 and also Java 6, so I doubt there should be too many differences.
        Hide
        Richard S. Hall added a comment -

        Rob, I've reverted the patch for now, since it covers up the real issue. I'd rather have people suffer the exception so we might find the real solution quicker. You can always reapply the workaround after the 4.0.2 release or locally if you want. Otherwise, continue to help me reproduce it so I can fix it. Thanks.

        Show
        Richard S. Hall added a comment - Rob, I've reverted the patch for now, since it covers up the real issue. I'd rather have people suffer the exception so we might find the real solution quicker. You can always reapply the workaround after the 4.0.2 release or locally if you want. Otherwise, continue to help me reproduce it so I can fix it. Thanks.
        Hide
        Rob Walker added a comment -

        Battling to recreate this one in an isolated environment.

        Happens every time on a warm start of our application, but can't seem to isolate the combination of bundles & bindings that trigger it.

        Show
        Rob Walker added a comment - Battling to recreate this one in an isolated environment. Happens every time on a warm start of our application, but can't seem to isolate the combination of bundles & bindings that trigger it.
        Hide
        Rob Walker added a comment -

        Yay!

        Finally nailed the recreation part. What triggers it is the refreshPackages call in the Activator sample code above. That only happens for a bundle with no exports, which wasn't the case for shell.gui. Swapping to an alternate bundle for the start -t and i can now recreate. Will tidy up my full example and post just now

        Show
        Rob Walker added a comment - Yay! Finally nailed the recreation part. What triggers it is the refreshPackages call in the Activator sample code above. That only happens for a bundle with no exports, which wasn't the case for shell.gui. Swapping to an alternate bundle for the start -t and i can now recreate. Will tidy up my full example and post just now
        Hide
        Richard S. Hall added a comment -

        Good news, thanks for figuring it out.

        Show
        Richard S. Hall added a comment - Good news, thanks for figuring it out.
        Hide
        Rob Walker added a comment -

        Worked example for recreating

        Show
        Rob Walker added a comment - Worked example for recreating
        Hide
        Rob Walker added a comment -

        Well - I've only figured how to recreate it, not actually fix the underlying cause. Baby steps

        Have attached a ZIP with the worked example. Afraid it's a Windows BAT file, should be hard to alter for .SH though.

        I've set it up so that only the custom bundles are present (including the original example). The felix standard bundles load from the SVN tree to save upload, and also make it easier to retest when fixed. Just alter the following env var to set where felix is rooted:

        set FELIX_HOME=C:/mnt/MiscProjs/felix

        First you need a cold run to get things going:

        >go clean

        Initial bundles will load, then a few seconds later the gui plugin will get loaded (that's really the only purpose of my main.Main class now - to autoload this to make recreating less typing). My output trace looks as follows:

        ===
        C:\mnt\tmp\Felix3160>go clean

        C:\mnt\tmp\Felix3160>set EXAMPLE_HOME=C:\mnt\tmp\Felix3160\/.

        C:\mnt\tmp\Felix3160>set FELIX_HOME=C:/mnt/MiscProjs/felix

        C:\mnt\tmp\Felix3160>set CLEAN=none

        C:\mnt\tmp\Felix3160>if "clean" == "clean" set CLEAN=onFirstInit

        C:\mnt\tmp\Felix3160>"C:\Program Files (x86)\Java\jdk1.6.0_27\bin\java" -Dfelix.home=C:/mnt/MiscProjs/felix -Dexample.home="C:\mnt\t
        mp\Felix3160\/." -Dclean=onFirstInit -Dconfig="C:\mnt\tmp\Felix3160\/./conf/config.properties" -cp C:/mnt/MiscProjs/felix/main/targ
        et/org.apache.felix.main-4.1.0-SNAPSHOT.jar;C:\mnt\tmp\Felix3160\/./lib/main.jar main.Main
        initializeAdmin: org.apache.felix.framework.PackageAdminImpl@b8f82d
        bundleChanged: watchdog [4]
        -> bundleChanged: org.osgi.service.obr [5]
        bundleChanged: org.osgi.service.obr [5]
        bundleChanged: org.apache.felix.shell [6]
        bundleChanged: org.apache.felix.shell.tui [1]
        bundleChanged: org.apache.felix.shell.tui [1]
        bundleChanged: org.apache.felix.shell.gui [2]
        bundleChanged: org.apache.felix.shell.gui [2]
        bundleChanged: org.apache.felix.framework [0]
        bundleChanged: org.apache.felix.shell.gui.plugin [9]
        bundleChanged: org.apache.felix.shell.gui.plugin [9]
        bundleChanged: org.apache.felix.shell.gui.plugin [9]
        ===

        At this stage Ctrl-C exit. The restart "warm"

        >go

        It seems to be the refreshPackages call that triggers the NPE - which as above, only happens if the auto uninstalled bundle had no exports

        My output trace from the warm start:

        ===
        C:\mnt\tmp\Felix3160>go

        C:\mnt\tmp\Felix3160>set EXAMPLE_HOME=C:\mnt\tmp\Felix3160\/.

        C:\mnt\tmp\Felix3160>set FELIX_HOME=C:/mnt/MiscProjs/felix

        C:\mnt\tmp\Felix3160>set CLEAN=none

        C:\mnt\tmp\Felix3160>if "" == "clean" set CLEAN=onFirstInit

        C:\mnt\tmp\Felix3160>"C:\Program Files (x86)\Java\jdk1.6.0_27\bin\java" -Dfelix.home=C:/mnt/MiscProjs/felix -Dexample.home="C:\mnt\t
        mp\Felix3160\/." -Dclean=none -Dconfig="C:\mnt\tmp\Felix3160\/./conf/config.properties" -cp C:/mnt/MiscProjs/felix/main/target/org.
        apache.felix.main-4.1.0-SNAPSHOT.jar;C:\mnt\tmp\Felix3160\/./lib/main.jar main.Main
        initializeAdmin: org.apache.felix.framework.PackageAdminImpl@b8f82d
        bundleChanged: watchdog [4]
        bundleChanged: org.osgi.service.obr [5]
        bundleChanged: org.osgi.service.obr [5]
        bundleChanged: org.apache.felix.shell [6]
        bundleChanged: org.apache.felix.shell.tui [1]
        bundleChanged: org.apache.felix.shell.tui [1]
        bundleChanged: org.apache.felix.shell.gui [2]
        -> bundleChanged: org.apache.felix.shell.gui [2]
        bundleChanged: org.apache.felix.framework [0]
        Removing transient bundle: org.apache.felix.shell.gui.plugin [9]
        bundleChanged: org.apache.felix.shell.gui.plugin [9]
        bundleChanged: org.apache.felix.shell.gui.plugin [9]
        refresh: org.apache.felix.shell.gui.plugin [9]
        Framework error, bundle: org.apache.felix.shell.gui.plugin [9]
        java.lang.NullPointerException
        at org.apache.felix.framework.BundleRevisionImpl.close(BundleRevisionImpl.java:639)
        at org.apache.felix.framework.BundleImpl.closeRevisions(BundleImpl.java:154)
        at org.apache.felix.framework.BundleImpl.closeAndDelete(BundleImpl.java:138)
        at org.apache.felix.framework.Felix$RefreshHelper.refreshOrRemove(Felix.java:4648)
        at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3960)
        at org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172)
        at java.lang.Thread.run(Thread.java:662)
        ===

        Show
        Rob Walker added a comment - Well - I've only figured how to recreate it, not actually fix the underlying cause. Baby steps Have attached a ZIP with the worked example. Afraid it's a Windows BAT file, should be hard to alter for .SH though. I've set it up so that only the custom bundles are present (including the original example). The felix standard bundles load from the SVN tree to save upload, and also make it easier to retest when fixed. Just alter the following env var to set where felix is rooted: set FELIX_HOME=C:/mnt/MiscProjs/felix First you need a cold run to get things going: >go clean Initial bundles will load, then a few seconds later the gui plugin will get loaded (that's really the only purpose of my main.Main class now - to autoload this to make recreating less typing). My output trace looks as follows: === C:\mnt\tmp\Felix3160>go clean C:\mnt\tmp\Felix3160>set EXAMPLE_HOME=C:\mnt\tmp\Felix3160\/. C:\mnt\tmp\Felix3160>set FELIX_HOME=C:/mnt/MiscProjs/felix C:\mnt\tmp\Felix3160>set CLEAN=none C:\mnt\tmp\Felix3160>if "clean" == "clean" set CLEAN=onFirstInit C:\mnt\tmp\Felix3160>"C:\Program Files (x86)\Java\jdk1.6.0_27\bin\java" -Dfelix.home=C:/mnt/MiscProjs/felix -Dexample.home="C:\mnt\t mp\Felix3160\/." -Dclean=onFirstInit -Dconfig="C:\mnt\tmp\Felix3160\/./conf/config.properties" -cp C:/mnt/MiscProjs/felix/main/targ et/org.apache.felix.main-4.1.0-SNAPSHOT.jar;C:\mnt\tmp\Felix3160\/./lib/main.jar main.Main initializeAdmin: org.apache.felix.framework.PackageAdminImpl@b8f82d bundleChanged: watchdog [4] -> bundleChanged: org.osgi.service.obr [5] bundleChanged: org.osgi.service.obr [5] bundleChanged: org.apache.felix.shell [6] bundleChanged: org.apache.felix.shell.tui [1] bundleChanged: org.apache.felix.shell.tui [1] bundleChanged: org.apache.felix.shell.gui [2] bundleChanged: org.apache.felix.shell.gui [2] bundleChanged: org.apache.felix.framework [0] bundleChanged: org.apache.felix.shell.gui.plugin [9] bundleChanged: org.apache.felix.shell.gui.plugin [9] bundleChanged: org.apache.felix.shell.gui.plugin [9] === At this stage Ctrl-C exit. The restart "warm" >go It seems to be the refreshPackages call that triggers the NPE - which as above, only happens if the auto uninstalled bundle had no exports My output trace from the warm start: === C:\mnt\tmp\Felix3160>go C:\mnt\tmp\Felix3160>set EXAMPLE_HOME=C:\mnt\tmp\Felix3160\/. C:\mnt\tmp\Felix3160>set FELIX_HOME=C:/mnt/MiscProjs/felix C:\mnt\tmp\Felix3160>set CLEAN=none C:\mnt\tmp\Felix3160>if "" == "clean" set CLEAN=onFirstInit C:\mnt\tmp\Felix3160>"C:\Program Files (x86)\Java\jdk1.6.0_27\bin\java" -Dfelix.home=C:/mnt/MiscProjs/felix -Dexample.home="C:\mnt\t mp\Felix3160\/." -Dclean=none -Dconfig="C:\mnt\tmp\Felix3160\/./conf/config.properties" -cp C:/mnt/MiscProjs/felix/main/target/org. apache.felix.main-4.1.0-SNAPSHOT.jar;C:\mnt\tmp\Felix3160\/./lib/main.jar main.Main initializeAdmin: org.apache.felix.framework.PackageAdminImpl@b8f82d bundleChanged: watchdog [4] bundleChanged: org.osgi.service.obr [5] bundleChanged: org.osgi.service.obr [5] bundleChanged: org.apache.felix.shell [6] bundleChanged: org.apache.felix.shell.tui [1] bundleChanged: org.apache.felix.shell.tui [1] bundleChanged: org.apache.felix.shell.gui [2] -> bundleChanged: org.apache.felix.shell.gui [2] bundleChanged: org.apache.felix.framework [0] Removing transient bundle: org.apache.felix.shell.gui.plugin [9] bundleChanged: org.apache.felix.shell.gui.plugin [9] bundleChanged: org.apache.felix.shell.gui.plugin [9] refresh: org.apache.felix.shell.gui.plugin [9] Framework error, bundle: org.apache.felix.shell.gui.plugin [9] java.lang.NullPointerException at org.apache.felix.framework.BundleRevisionImpl.close(BundleRevisionImpl.java:639) at org.apache.felix.framework.BundleImpl.closeRevisions(BundleImpl.java:154) at org.apache.felix.framework.BundleImpl.closeAndDelete(BundleImpl.java:138) at org.apache.felix.framework.Felix$RefreshHelper.refreshOrRemove(Felix.java:4648) at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3960) at org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172) at java.lang.Thread.run(Thread.java:662) ===
        Hide
        Richard S. Hall added a comment -

        I was able to recreate and I've determined what is happening.

        Basically, the framework auto-refreshes the bundle when you do an uninstall (since no one depends on it), then your worker explicitly refreshes the uninstalled bundle once it receives the UNINSTALLED event, but at this time the bundle has already been refreshed and is now stale. That's why the content is already null, since we end up refreshing a stale bundle.

        During the refactoring for the the R4.3 API, we must have lost a check somewhere that avoided this situation. I believe the proper thing to do here is to just ignore stale bundles when refreshing. I'll investigate whether this is the correct behavior and, if so, will implement a fix.

        Show
        Richard S. Hall added a comment - I was able to recreate and I've determined what is happening. Basically, the framework auto-refreshes the bundle when you do an uninstall (since no one depends on it), then your worker explicitly refreshes the uninstalled bundle once it receives the UNINSTALLED event, but at this time the bundle has already been refreshed and is now stale. That's why the content is already null, since we end up refreshing a stale bundle. During the refactoring for the the R4.3 API, we must have lost a check somewhere that avoided this situation. I believe the proper thing to do here is to just ignore stale bundles when refreshing. I'll investigate whether this is the correct behavior and, if so, will implement a fix.
        Hide
        Richard S. Hall added a comment -

        I've committed a patch that turns such an occurrence into a no-op, assuming for now that this is the right thing to do. Please check that the trunk framework fixes the issue for you. Thanks for your help.

        Show
        Richard S. Hall added a comment - I've committed a patch that turns such an occurrence into a no-op, assuming for now that this is the right thing to do. Please check that the trunk framework fixes the issue for you. Thanks for your help.
        Hide
        Rob Walker added a comment -

        Thanks for the patch. Has fixed the NPE on our startup, good news as it means we can move back to trunk version.
        Let me know if I should close the issue, or if you'd rather leave open until you've had a chance to investigate correct behaviour.

        Show
        Rob Walker added a comment - Thanks for the patch. Has fixed the NPE on our startup, good news as it means we can move back to trunk version. Let me know if I should close the issue, or if you'd rather leave open until you've had a chance to investigate correct behaviour.
        Hide
        Richard S. Hall added a comment -

        Leave it open for now. I'll close it once I'm certain. Thanks for verifying the fix.

        Show
        Richard S. Hall added a comment - Leave it open for now. I'll close it once I'm certain. Thanks for verifying the fix.
        Hide
        Richard S. Hall added a comment -

        I guess this is fine. I'm closing it as fixed.

        Show
        Richard S. Hall added a comment - I guess this is fine. I'm closing it as fixed.

          People

          • Assignee:
            Richard S. Hall
            Reporter:
            Rob Walker
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development