Uploaded image for project: 'CXF Distributed OSGi (Retired)'
  1. CXF Distributed OSGi (Retired)
  2. DOSGI-161

services sometimes don't get exported

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.4.0
    • 1.5.0
    • common
    • None
    • Oracle JDK 1.7.0_17, Karaf 2.3.1, DOSGi 1.4.0 (installed from Karaf feature)

    • Unknown

    Description

      When starting up a fresh Karaf instance repeatedly, a service that should be exported sometimes does not in fact get exported. The issue stems from some bugs in the TopologyManagerExport class, the main ones being wrong assumptions about the state of the Tracker within its callback method (during the callback its internal state is not yet updated, but the code assumes it is), unconditional overwriting of data structures (which may already exist) and synchronization issues.

      During my investigations I came across various other small issues and enhancements in the class, so the provided patch includes the final result I arrived with including all changes. I've split it up into several consecutive patches in order to make the review easier. These are the changes:

      Patch #1 - cleanup (no semantic changes):

      • fixed typos in docs and logs.
      • fixed formatting to be more consistent and readable.
      • added debug logs to RemoteServiceAdminLifeCycleListener.added/removed callbacks (this proved the serious tracker bug below, but is also useful for general debugging).
      • simplified shouldExportService returned boolean expression.
      • removed removeExportRegistration, removeExportReference and remoteAdminEvent methods, all of which are both unused and have a no-op implementation.

      Patch #2 (improved iterations):

      • replaced iterations on map keys followed by a get of the corresponding value with an iteration on Map.Entry (prevents unnecessary lookups).
      • replaced map containsKey checks followed by get with a get followed by null check (prevents unnecessary lookups, and theoretically race conditions too).
      • fixed remoteServiceAdmin.exportService return value handling - its docs say the result can never be null, but the code assumes it might be. This is now replaced with an isEmpty() check instead, I hope there are no other consequences. The reason this is bundled with the iterations fixes is since it shows explicitly that map values cannot in fact be null, which proves the iteration fixes to be correct (i.e. if Map.get returns null, the key does not exist, so this is equivalent to calling Map.containsKey).

      Patch #3 (tracker and sync issues):

      • fixed tracker misuse: when tracker callback (added) is called, the tracker itself does not yet reflect the new state, so accessing services (or size) on the tracker from within the callback will fail to include the new service. Although the actual service export triggered by the callback is performed on a separate thread, this still leaves a race condition between the two which sometimes causes the export to fail (with "No RemoteServiceAdmin available!" log). To fix this, the newly added RSA is passed down the method chain to the doExportService method explicitly. doExportService then uses the reference if it is given, or if null (as when called from elsewhere), uses the tracker to get the service list as it did in the past.
      • fixed exportService always overwriting a service's entry in exportedServices with a new empty data structure - now it only does so if it does not already exist.
      • fixed exportedServices inner maps to be created as synchronized maps - previously doExportService wrapped it in a synchronized map on each invocation, which results effectively in no synchronization, since the synchronized wrapper map is local only so each thread accessing the map uses a different lock. Also fixed other accesses of the inner maps to be synchronized (Collections.synchronizedMap requires external synchronization on it when being iterated, as specified in the docs).
      • fixed exports null check in doExportService (in the old code, Collections.synchronizedMap never returns null; In the new code it makes more sense in case of concurrent removal.)
      • removed redundant remoteServiceAdminTracker null check in doExportService (it is already used in the constructor, so can't be null at this point).

      Please review the changes to make sure nothing was missed (a cluster of issues in the same code section usually implies additional lurking bugs...), and feel free to drop any non-critical changes that you don't like - I find them all beneficial, but it can be a matter of taste.

      Attachments

        1. fix_topologymanagerexport_1.diff
          8 kB
          Amichai Rothman
        2. fix_topologymanagerexport_2.diff
          4 kB
          Amichai Rothman
        3. fix_topologymanagerexport_3.diff
          7 kB
          Amichai Rothman

        Activity

          People

            amichai Amichai Rothman
            amichai Amichai Rothman
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment