Apache Gora
  1. Apache Gora
  2. GORA-105

DataStoreFactory does not properly support multiple stores

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2
    • Component/s: schema, storage
    • Labels:
      None

      Description

      DataStoreFactory has a single, static properties field. This is completely unacceptable, because that way when multiple stores are instantiated in the same JVM, the last store instance will overwrite the "default.schema" property. This causes that all the previous stores will read a misconfigured default schema property. Beside this it may cause several other nasty future bugs. In my opinion this is a blocker because the methods on DataStoreFactory suggest that it can handle multiple stores, when as a matter fact it doesn't.

      I will attach and commit a patch that fixes this problem. It only modifies gora-core. All stores directly benefit from this bugfix because of DataStoreBase. This patch fixes the following property related problems.

      -It introduces a static method createProps in DataStoreFactory. This is the equivalent of Configuration.create(). Everyone can create a new properties object and set everything interesting on it and pass it on to whatever stores they like, instead of ALL stores.
      -It fixes the method javadoc of DataStoreBase#getSchemaName(String mappingSchemaName, Class<?> persistentClass). The previous description was simply wrong.
      -It SERIALIZES the properties field of DataStoreBase instead of grabbing the static DataStoreFactory.properties field. This has the additional benefit of making sure that the store can be used correctly with runtime modified properties in a mapreduce context.
      -It removes the caching functionality of DataStoreFactory. Because of the dynamic configuration in the Properties and Configuration object, it is very difficult to implement a correct key hash for the cache. At the moment it only uses the triple

      {datastoreClass, keyClass,valueClass} as a key hash. Multiple stores cannot be properly supported when the factory uses badly implemented hash keys. (For example, one might instantiate 2 SqlStores, both using the exact same {datastoreClass, keyClass,valueClass}

      triple, but pointing to different databases. When one is about the instantiate the second datastore, it will faulty return the first datastore from cache). We can always reintroduce caching functionality when we can implement a proper key.

      The patch passes all tests. Will commit when there are no objections.

      1. GORA-105-v2.patch
        24 kB
        Ferdy Galema
      2. GORA-105.patch
        17 kB
        Ferdy Galema

        Issue Links

          Activity

          Ferdy Galema created issue -
          Ferdy Galema made changes -
          Field Original Value New Value
          Attachment GORA-105.patch [ 12517567 ]
          Ferdy Galema made changes -
          Link This issue is depended upon by NUTCH-882 [ NUTCH-882 ]
          Ferdy Galema made changes -
          Attachment GORA-105-v2.patch [ 12517705 ]
          Ferdy Galema made changes -
          Status Open [ 1 ] Closed [ 6 ]
          Resolution Fixed [ 1 ]
          Gavin made changes -
          Workflow jira [ 12657108 ] no-reopen-closed, patch-avail [ 12735226 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Ferdy Galema
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development