JDO
  1. JDO
  2. JDO-651

Modify specification to address NoSQL datastores

    Details

      Description

      There is increasing interest in NoSQL datastores (Google BigTable, Apache HBase, VMWare Redis, etc), which not only do not support SQL, but also do not necessarily provide support for traditional consistency or queriability features or guarantees, instead offering features like eventual consistency, key-value storage mechanisms, etc.

      This request is to modify the JDO specification (and TCK & RI) so that it relaxes certain portions of the specification, perhaps in the form of profiles similar to JavaEE 6 profiles, to allow datastores that may not support queries in general, do not support the ACID requirements, or that support key-value-based storage mechanisms to be JDO-compliant. Additions to the specification may also be needed in order to directly address NoSQL-type datastores, in a manner similar to its treatment of relational persistence mechanisms.

      Additionally, this request would serve to better support persistence on micro platforms where consistency, queriability, etc, may not be supported.

        Issue Links

          Activity

          Hide
          Andy Jefferson added a comment -

          No clear definition of work required, hence cannot be considered for "next release" until there is

          Show
          Andy Jefferson added a comment - No clear definition of work required, hence cannot be considered for "next release" until there is
          Hide
          Andy Jefferson added a comment -

          Clarification :
          When using optimistic txns commit/flush could still throw JDOOptimisticVerificationException (since an implementation will not have applied the changes to the datastore, so can still check the datastore for version updates before persisting) and rollback could throw away changes if not yet flushed (i.e PDIRTY -> HOLLOW).
          When using datastore txns then the implementation is at liberty to push changes to the datastore at any time (just as it can for RDBMS) so commit/flush/rollback likely will be no-op (unless the implementation is queuing changes of course). Once changes have been flushed then rollback will be no-op (in terms of the datastore, since there is no way of backing the changes out).

          Show
          Andy Jefferson added a comment - Clarification : When using optimistic txns commit/flush could still throw JDOOptimisticVerificationException (since an implementation will not have applied the changes to the datastore, so can still check the datastore for version updates before persisting) and rollback could throw away changes if not yet flushed (i.e PDIRTY -> HOLLOW). When using datastore txns then the implementation is at liberty to push changes to the datastore at any time (just as it can for RDBMS) so commit/flush/rollback likely will be no-op (unless the implementation is queuing changes of course). Once changes have been flushed then rollback will be no-op (in terms of the datastore, since there is no way of backing the changes out).
          Hide
          Michael Bouschen added a comment -

          Interesting idea!

          Questions:
          With tx.commit being a no-op, how do the changes get synchronized to the datastore? On each change of the Java instance? Does the user need to call flush? Or is flush a no-op, too?
          When supporting persistent-clean ->hollow on commit, how about doing a persistent-dirty -> hollow on rollback?

          Show
          Michael Bouschen added a comment - Interesting idea! Questions: With tx.commit being a no-op, how do the changes get synchronized to the datastore? On each change of the Java instance? Does the user need to call flush? Or is flush a no-op, too? When supporting persistent-clean ->hollow on commit, how about doing a persistent-dirty -> hollow on rollback?
          Hide
          Andy Jefferson added a comment -

          Further update to the spec required for datastores that don't support transactions (Cassandra for example). Clearly an implementation can wire tx.begin/tx.commit/tx.rollback to be no-op, so the user who switches to a non-transactional datastore can use their same persistence code. The spec would need to consider lifecycle transitions for such a datastore - supporting PCLEAN->HOLLOW on commit makes sense, but would also need to document that rollback operations would be the same as commit.

          Would likely be good to add an optional feature in to 11.6
          javax.jdo.option.Transactions

          Maybe there are other places that this impacts on.

          Show
          Andy Jefferson added a comment - Further update to the spec required for datastores that don't support transactions (Cassandra for example). Clearly an implementation can wire tx.begin/tx.commit/tx.rollback to be no-op, so the user who switches to a non-transactional datastore can use their same persistence code. The spec would need to consider lifecycle transitions for such a datastore - supporting PCLEAN->HOLLOW on commit makes sense, but would also need to document that rollback operations would be the same as commit. Would likely be good to add an optional feature in to 11.6 javax.jdo.option.Transactions Maybe there are other places that this impacts on.
          Hide
          Andy Jefferson added a comment -

          One further possible requirement. With a document-based store (e.g MongoDB), the datastore stores objects in JSON document format. With this (and with XML-based storage also for that matter), you have two options in terms of "embedding" a persistable field.
          class Foo

          { @Embedded Bar bar; }

          Option 1 : store the related object embedded as a document in the document of the owner (i.e nested)
          Option 2 : map all fields of the related object to "field" names in the document of the owner (like we would do it for RDBMS).

          Would be nice to have a meta-data way of defining these two situations distinctly.

          Show
          Andy Jefferson added a comment - One further possible requirement. With a document-based store (e.g MongoDB), the datastore stores objects in JSON document format. With this (and with XML-based storage also for that matter), you have two options in terms of "embedding" a persistable field. class Foo { @Embedded Bar bar; } Option 1 : store the related object embedded as a document in the document of the owner (i.e nested) Option 2 : map all fields of the related object to "field" names in the document of the owner (like we would do it for RDBMS). Would be nice to have a meta-data way of defining these two situations distinctly.
          Hide
          Andy Jefferson added a comment -

          DataNucleus support on HBase hasn't encountered anything that cannot fit into the JDO spec.
          http://www.datanucleus.org/products/accessplatform_3_0/datastore_features.html

          Initial work supporting MongoDB likewise hasn't had such problems.

          The only thing I vaguely remember was when a DN user was implementing support for Cassandra it would have been convenient to have a way of setting properties on the PM (as opposed to the PMF). This would add extra flexibility anyway, allowing vendor-specifics to be turned on/off during a transaction for example.

          Show
          Andy Jefferson added a comment - DataNucleus support on HBase hasn't encountered anything that cannot fit into the JDO spec. http://www.datanucleus.org/products/accessplatform_3_0/datastore_features.html Initial work supporting MongoDB likewise hasn't had such problems. The only thing I vaguely remember was when a DN user was implementing support for Cassandra it would have been convenient to have a way of setting properties on the PM (as opposed to the PMF). This would add extra flexibility anyway, allowing vendor-specifics to be turned on/off during a transaction for example.
          Hide
          Andy Jefferson added a comment -

          Background reading relating to "problems" in GAE/J's handling of JDO
          http://datanucleus.blogspot.com/2010/01/gaej-and-jdojpa.html

          Show
          Andy Jefferson added a comment - Background reading relating to "problems" in GAE/J's handling of JDO http://datanucleus.blogspot.com/2010/01/gaej-and-jdojpa.html
          Hide
          Craig L Russell added a comment -

          JDO was designed to support non-SQL datastores. In fact, the first RI was a distinctly non-SQL implementation, essentially a key-value store.

          What needs to be done for NoSQL in general is to go through the specification in detail and highlight those parts that cannot now be implemented reasonably by a key-value store, or a non-transactional datastore.

          NoSQL is probably too big a description of the kinds of datastores that makes sense for JDO to support. Part of the exercise should be to categorize the NoSQL datastores to focus on those categories that make sense for JDO.

          Show
          Craig L Russell added a comment - JDO was designed to support non-SQL datastores. In fact, the first RI was a distinctly non-SQL implementation, essentially a key-value store. What needs to be done for NoSQL in general is to go through the specification in detail and highlight those parts that cannot now be implemented reasonably by a key-value store, or a non-transactional datastore. NoSQL is probably too big a description of the kinds of datastores that makes sense for JDO to support. Part of the exercise should be to categorize the NoSQL datastores to focus on those categories that make sense for JDO.

            People

            • Assignee:
              Matthew T. Adams
              Reporter:
              Matthew T. Adams
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development