Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Incomplete
    • Affects Version/s: SVN trunk
    • Fix Version/s: None
    • Component/s: ALL APPLICATIONS
    • Labels:
      None

      Description

      My current requirements:

      • store uploaded documents (pdf and scans), mainly for legal compliance reasons
      • old document versions should be accessible
      • documents should be associated with existing entities. So far I've identified a need to associate with Product, Party, OrderHeader, ShipmentItem, probably InventoryItemDetail and maybe WorkEffort. I would not be surprised if we discover more as this project proceeds.
      • documents may have a type and a purpose, though sometimes I'm not sure of the difference. For example, type: drivers_licence might be purpose: identification, and/or purpose: permission_to_drive, while type: shipping_label would be purpose: shipping_label
      • many documents have an expiry date (e.g. drivers licence)
      • a document may become invalid before its expiry date (e.g. because the law changed)
      • a specific version of a document may need to be associated with an entity. For example, a licence agreement document accessed via a Product should always be the latest version. However the version of that document actually shipped with the product should be associated with the ShipmentItem.
      • a single document might be associated with more than one entity type: see the example in the previous point

      Not all documents require all of the above. For example, there are some documents where we don't need to track which version was used when, and some without expiry dates.

      I'm thinking of using the from/thruDate pattern to handle expiry related needs. I'd like to put as much information into the jcr path as possible, so less needs to go into entities, as per Sascha's suggestion on the dev ML. However (at least) from/thruDate and which version of a document was actually used where will presumably need to be stored in an entity.

        Activity

        Hide
        Sascha Rodekamp added a comment - - edited

        Hi Anne

        • store uploaded documents (pdf and scans), mainly for legal compliance reasons
          Should already work
        • old document versions should be accessible
          Should also already work
        • documents should be associated with existing entities. So far I've identified a need to associate with Product, Party, OrderHeader, ShipmentItem, probably InventoryItemDetail and maybe WorkEffort. I would not be surprised if we discover more as this project proceeds.
          There have to be somehow a connection between entities and jcr content, but i'm not sure what to put in the DB and which information in the node.
        • documents may have a type and a purpose, though sometimes I'm not sure of the difference. For example, type: drivers_licence might be purpose: identification, and/or purpose: permission_to_drive, while type: shipping_label would be purpose: shipping_label
          Should this be stored in a content jcr node or do you think this have to be an entity field?
        • many documents have an expiry date (e.g. drivers licence)
        • a document may become invalid before its expiry date (e.g. because the law changed)
          Same question, is it better to create a DB field or store the expire dates along with the jcr nodes
        • a specific version of a document may need to be associated with an entity. For example, a licence agreement document accessed via a Product should always be the latest version. However the version of that document actually shipped with the product should be associated with the ShipmentItem.
          I think here we really need a DB connection, because the relation between a product and a certain version of a document could not be encoded in the jcr node
        • a single document might be associated with more than one entity type: see the example in the previous point

        Beside the node path it is possible to store information directly in the jcr node. Nodes with certain information can be selected by using SQL2 Querys which are already implemented in Jackrabbit.

        Another point which comes in my mind is rights management. Not sure if this is an issue in your case but we have to consider it.

        Show
        Sascha Rodekamp added a comment - - edited Hi Anne store uploaded documents (pdf and scans), mainly for legal compliance reasons Should already work old document versions should be accessible Should also already work documents should be associated with existing entities. So far I've identified a need to associate with Product, Party, OrderHeader, ShipmentItem, probably InventoryItemDetail and maybe WorkEffort. I would not be surprised if we discover more as this project proceeds. There have to be somehow a connection between entities and jcr content, but i'm not sure what to put in the DB and which information in the node. documents may have a type and a purpose, though sometimes I'm not sure of the difference. For example, type: drivers_licence might be purpose: identification, and/or purpose: permission_to_drive, while type: shipping_label would be purpose: shipping_label Should this be stored in a content jcr node or do you think this have to be an entity field? many documents have an expiry date (e.g. drivers licence) a document may become invalid before its expiry date (e.g. because the law changed) Same question, is it better to create a DB field or store the expire dates along with the jcr nodes a specific version of a document may need to be associated with an entity. For example, a licence agreement document accessed via a Product should always be the latest version. However the version of that document actually shipped with the product should be associated with the ShipmentItem. I think here we really need a DB connection, because the relation between a product and a certain version of a document could not be encoded in the jcr node a single document might be associated with more than one entity type: see the example in the previous point Beside the node path it is possible to store information directly in the jcr node. Nodes with certain information can be selected by using SQL2 Querys which are already implemented in Jackrabbit. Another point which comes in my mind is rights management. Not sure if this is an issue in your case but we have to consider it.
        Hide
        Anne Jessel added a comment -

        Hi Sascha

        I agree with you. There will need to be entities to track the connections, so the key thing is what goes in an entity and what in a node.

        I'm thinking we will need several new entities such as PartyContentJcr and ProductContentJcr (suggestions for names welcome). Should these link directly to jcr nodes, or should there be a ContentJcr entity which represents the jcr node? I think a ContentJcr, so we can store (at least) from/thruDate there. But the reason I think that is because I am familiar with OOTB support for from/thruDate queries. I do not know jcr well enough: would these queries be just as efficient with the Jackrabbit SQL2 queries?

        Rights management is not an issue for me. But I am sure it will need to be added sometime.

        Show
        Anne Jessel added a comment - Hi Sascha I agree with you. There will need to be entities to track the connections, so the key thing is what goes in an entity and what in a node. I'm thinking we will need several new entities such as PartyContentJcr and ProductContentJcr (suggestions for names welcome). Should these link directly to jcr nodes, or should there be a ContentJcr entity which represents the jcr node? I think a ContentJcr, so we can store (at least) from/thruDate there. But the reason I think that is because I am familiar with OOTB support for from/thruDate queries. I do not know jcr well enough: would these queries be just as efficient with the Jackrabbit SQL2 queries? Rights management is not an issue for me. But I am sure it will need to be added sometime.
        Hide
        Jacopo Cappellato added a comment -

        What about extending the Content entity to support JCR and then use the existing PartyContent/ProductContent etc... to specify the associations (as it is now)?

        Show
        Jacopo Cappellato added a comment - What about extending the Content entity to support JCR and then use the existing PartyContent/ProductContent etc... to specify the associations (as it is now)?
        Hide
        Sascha Rodekamp added a comment -

        Yes Jacopo, we should definitely the existing tables.

        I'm a little bit afraid, that to much content information will be stored in the database tables.
        But i see no way handling content without a connection to the DB so it's ok to use the content entities to manage it, if we have a DB lookup anyway we can use the from/thruDate filter.

        Anyway another point which comes to my mind. Would you disable the current product content storage? Or should we use the current entity based storage and the repository storage parallel? I think of saying the *ContentWorker which storage point he have to use (entity or repository).
        Means if someone want to use the DB, he can configure to use a ProductEntityContentWorker if he otherwise want to use the repository he can configure the ProductRepositoryContentWorker. The content Worker encapsulate all the access to the content store point.

        Show
        Sascha Rodekamp added a comment - Yes Jacopo, we should definitely the existing tables. I'm a little bit afraid, that to much content information will be stored in the database tables. But i see no way handling content without a connection to the DB so it's ok to use the content entities to manage it, if we have a DB lookup anyway we can use the from/thruDate filter. Anyway another point which comes to my mind. Would you disable the current product content storage? Or should we use the current entity based storage and the repository storage parallel? I think of saying the *ContentWorker which storage point he have to use (entity or repository). Means if someone want to use the DB, he can configure to use a ProductEntityContentWorker if he otherwise want to use the repository he can configure the ProductRepositoryContentWorker. The content Worker encapsulate all the access to the content store point.
        Hide
        Pierre Smits added a comment -

        Perhaps it is a good idea to include end-of-life for current content storage somewhere in the roadmap, if the jcr approach proves to be more flexible and easier to use. This will help with communications and planning of phasing-out activities.

        Regards,

        Pierre Smits

        Show
        Pierre Smits added a comment - Perhaps it is a good idea to include end-of-life for current content storage somewhere in the roadmap, if the jcr approach proves to be more flexible and easier to use. This will help with communications and planning of phasing-out activities. Regards, Pierre Smits
        Hide
        Jacopo Cappellato added a comment -

        Sascha,

        I was thinking that, if for example we add a new contentTypeId of "JCR_CONTENT" (or similar), then all Content records with that type will use the external JCR repository, while the "old" Content records will still use the OFBiz DB; in this way the two mechanisms will fit together nicely.
        This is the general idea... as regards *ContentWorker specific implementation I don't know... some of the classes are rather old and not perfect but if it makes sense we could extend them to check the contentTypeId and use JCR if the type is JCR_CONTENT.

        Show
        Jacopo Cappellato added a comment - Sascha, I was thinking that, if for example we add a new contentTypeId of "JCR_CONTENT" (or similar), then all Content records with that type will use the external JCR repository, while the "old" Content records will still use the OFBiz DB; in this way the two mechanisms will fit together nicely. This is the general idea... as regards *ContentWorker specific implementation I don't know... some of the classes are rather old and not perfect but if it makes sense we could extend them to check the contentTypeId and use JCR if the type is JCR_CONTENT.
        Hide
        Sascha Rodekamp added a comment - - edited

        Yap we need a separate content type: I would suggest JCR_CONTENT_* to differentiate between images, text, html and so on.

        *ContentWorker ... some of the classes are rather old and not perfect

        Right, but i don't like the idea to extend tho old code, because i think we can do much better.
        Leave the old class as it is and let us use a factory which decided which implementation for the specific content should be used (maybe depending on the content type). That gives us the ability to:
        1.) create new clean code (and test drive it )
        2.) make the DB and repository code independent (at some point in the feature we can simply remove one implementation)
        3.) we haven't to worry to break anything from the exciting code

          ---> can load ---> *EntityContentWorker (implements ContentWorker)
        *ContentWorkerFactory ---->
          ---> can load ---> *RepositoryContentWorker (implements ContentWorker)
        Show
        Sascha Rodekamp added a comment - - edited Yap we need a separate content type: I would suggest JCR_CONTENT_* to differentiate between images, text, html and so on. *ContentWorker ... some of the classes are rather old and not perfect Right, but i don't like the idea to extend tho old code, because i think we can do much better. Leave the old class as it is and let us use a factory which decided which implementation for the specific content should be used (maybe depending on the content type). That gives us the ability to: 1.) create new clean code (and test drive it ) 2.) make the DB and repository code independent (at some point in the feature we can simply remove one implementation) 3.) we haven't to worry to break anything from the exciting code   ---> can load ---> *EntityContentWorker (implements ContentWorker) *ContentWorkerFactory ---->   ---> can load ---> *RepositoryContentWorker (implements ContentWorker)
        Hide
        Jacopo Cappellato added a comment -

        Yes, it is fine to me, it will maybe require to change more code initially (all the code that is currently using *ContentWorker or *ContentWrappers methods) but it may be a nice refactoring.
        At the end, the important thing imo is to continue to use the Content and PartyContent/ProductContent... entities following a similar pattern: the actual implementation of util methods/wrappers can take several similar shapes.

        Show
        Jacopo Cappellato added a comment - Yes, it is fine to me, it will maybe require to change more code initially (all the code that is currently using *ContentWorker or *ContentWrappers methods) but it may be a nice refactoring. At the end, the important thing imo is to continue to use the Content and PartyContent/ProductContent... entities following a similar pattern: the actual implementation of util methods/wrappers can take several similar shapes.
        Hide
        Anne Jessel added a comment -

        Thanks everyone for the comments. I like the direction this is heading.

        I think the treatment of contentTypeId needs more work. Currently this has values such as ANNOTATION, DECORATOR, TOPIC. Adding JCR_CONTENT_ANNOTATION and similar would make it difficult to find all the ANNOTATION content. Perhaps we should add an extra field to Content, called storageTypeId? It could have two possible values (at this stage) ENTITY or JCR, with default being ENTITY for backwards compatability.

        Also, Content entity doesn't currently have from/thruDate fields. I'll need to add those.

        Show
        Anne Jessel added a comment - Thanks everyone for the comments. I like the direction this is heading. I think the treatment of contentTypeId needs more work. Currently this has values such as ANNOTATION, DECORATOR, TOPIC. Adding JCR_CONTENT_ANNOTATION and similar would make it difficult to find all the ANNOTATION content. Perhaps we should add an extra field to Content, called storageTypeId? It could have two possible values (at this stage) ENTITY or JCR, with default being ENTITY for backwards compatability. Also, Content entity doesn't currently have from/thruDate fields. I'll need to add those.
        Hide
        Jacopo Cappellato added a comment -

        Another option (but please consider that I am not looking at the details and I am simply providing some ideas) can be that of keeping the Content entity (and its data) as it is currently and then we define a new dataResourceTypeId (for JCR):

        Content (ANNOTATION, DECORATOR, etc...) --> DataResource (OFBIZ_FILE...) --> "old" content
        Content (ANNOTATION, DECORATOR, etc...) --> DataResource (JCR) --> JCR ("new" content)

        Show
        Jacopo Cappellato added a comment - Another option (but please consider that I am not looking at the details and I am simply providing some ideas) can be that of keeping the Content entity (and its data) as it is currently and then we define a new dataResourceTypeId (for JCR): Content (ANNOTATION, DECORATOR, etc...) --> DataResource (OFBIZ_FILE...) --> "old" content Content (ANNOTATION, DECORATOR, etc...) --> DataResource (JCR) --> JCR ("new" content)
        Hide
        Sascha Rodekamp added a comment -

        Hi,
        that means we will have a structure like:

        ProductContent --> Content --> DataSource --> JCR Repository (which have a tree of content nodes)

        That could tempt people to use ContentAssoc and all the fields in DataSource to store content information and arrange the content order. But i think that should be done in the repository, because otherwise we ignore the benefits of the jackrabbit repository and use it as simple datastore.

        My Suggestion is to keep the DB site as simple and flat as possible and let the repository do the rest....

        Btw Anne, you can store the from/thruDate in the ProductContent entity which should be sufficient.

        Show
        Sascha Rodekamp added a comment - Hi, that means we will have a structure like: ProductContent --> Content --> DataSource --> JCR Repository (which have a tree of content nodes) That could tempt people to use ContentAssoc and all the fields in DataSource to store content information and arrange the content order. But i think that should be done in the repository, because otherwise we ignore the benefits of the jackrabbit repository and use it as simple datastore. My Suggestion is to keep the DB site as simple and flat as possible and let the repository do the rest.... Btw Anne, you can store the from/thruDate in the ProductContent entity which should be sufficient.
        Hide
        Jacques Le Roux added a comment - - edited

        I did not look into details at all yet (hopefully this weekend). But I tend to agree with Sascha. A good reminder is this Adam's "advocation" for a file system

        It's a nightmare to udate contents (from staging/QA to production or any type of update) when they are in a DB... For contents think templates and such...

        EDIT: === I used the word repo instead of production, it was a mistake, but it shows well my concern ===

        Show
        Jacques Le Roux added a comment - - edited I did not look into details at all yet (hopefully this weekend). But I tend to agree with Sascha. A good reminder is this Adam's "advocation" for a file system It's a nightmare to udate contents (from staging/QA to production or any type of update) when they are in a DB... For contents think templates and such... EDIT: === I used the word repo instead of production, it was a mistake, but it shows well my concern ===
        Hide
        Jacques Le Roux added a comment -

        Ok I read the convo, and I'm still in favor of Sascha's global view.

        My Suggestion is to keep the DB site as simple and flat as possible and let the repository do the rest....

        So Anne's suggestion of adding a storageTypeId field to Content Entity seems the best solution to me, so far.

        Anne, for the point:

        • a document may become invalid before its expiry date (e.g. because the law changed)
          You could use the Content.statusId field

        Unrelated, but I wanted to say that from/thruDate are cool, but not close to what a versioning system can offer...


        Mostly notes for myself
        By and large (not only this issue), I must say that I'd like to think more about it. I mean to envision more scenarios, notably for handling Contents moves and udpates. And also activation (which version of a Content to use at a moment) not sure yet if it relates with activity concept in JackRabbit (in http://www.day.com/maven/jsr170/javadocs/jcr-2.0/javax/jcr/version/VersionManager.html, oops, for less than a second I did not remember Adobe had bought Day Software, ok not a pb still ASL2 anyway)

        OK also regarding Adam's comment I mentionned, I read http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ and it's a more evolved than I thought, why a lot of Persistence Manager types and descripted strategies.

        Show
        Jacques Le Roux added a comment - Ok I read the convo, and I'm still in favor of Sascha's global view. My Suggestion is to keep the DB site as simple and flat as possible and let the repository do the rest.... So Anne's suggestion of adding a storageTypeId field to Content Entity seems the best solution to me, so far. Anne, for the point: a document may become invalid before its expiry date (e.g. because the law changed) You could use the Content.statusId field Unrelated, but I wanted to say that from/thruDate are cool, but not close to what a versioning system can offer... Mostly notes for myself By and large (not only this issue), I must say that I'd like to think more about it. I mean to envision more scenarios, notably for handling Contents moves and udpates. And also activation (which version of a Content to use at a moment) not sure yet if it relates with activity concept in JackRabbit (in http://www.day.com/maven/jsr170/javadocs/jcr-2.0/javax/jcr/version/VersionManager.html , oops, for less than a second I did not remember Adobe had bought Day Software, ok not a pb still ASL2 anyway) OK also regarding Adam's comment I mentionned, I read http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ and it's a more evolved than I thought, why a lot of Persistence Manager types and descripted strategies.
        Hide
        Jacopo Cappellato added a comment -

        Well, I am not sure if Anne's proposal is in the direction of a flatter db structure; she mentions the need to have storageTypeId=JCR and several contentTypeId (ANNOTATION, DECORATOR etc...) all maintained in the Content data model.
        If we instead use a flat contentTypeId=JCR and then delegate to JCR to manage the type of content etc. we will have mostly everything on JCR (if contentTypeId is JCR) and still have the ability to use traditional content... but of course this is too generic because I actually didn't look at the details and I am not even sure I know what are all the requirements and goals of this effort.

        Show
        Jacopo Cappellato added a comment - Well, I am not sure if Anne's proposal is in the direction of a flatter db structure; she mentions the need to have storageTypeId=JCR and several contentTypeId (ANNOTATION, DECORATOR etc...) all maintained in the Content data model. If we instead use a flat contentTypeId=JCR and then delegate to JCR to manage the type of content etc. we will have mostly everything on JCR (if contentTypeId is JCR) and still have the ability to use traditional content... but of course this is too generic because I actually didn't look at the details and I am not even sure I know what are all the requirements and goals of this effort.
        Hide
        Adrian Crum added a comment -

        Is there a design document somewhere? I mean for this integration. That document should list the design criteria, objectives, deliverables, etc.

        Show
        Adrian Crum added a comment - Is there a design document somewhere? I mean for this integration. That document should list the design criteria, objectives, deliverables, etc.
        Hide
        Jacques Le Roux added a comment -

        To be fair, in my mind I was not referring to Anne's specific problem. But to her generic solution (Content.storageTypeId) mixed with a proposition based on Sascha's initial proposition of using a ContentFactory (he spoke about *ContentWorkerFactory). This to instantiate the respective "DataSource types".

        In other words we would use a ContentFactory Interface (or Abstract Class) to delegate the work to concrete classes DbContentFactory or JcrContentFactory, ...

        But indeed it's maybe easier to cut up the steam with a contentTypeId="JCR". I just wonder if it will not be to much confusing because it's no really a content type but a family of content types. Hence the Content.storageTypeId, where we could also decide of the JackRabbit Persistence Manager:
        storageTypeId = OFBizDB (standard)
        storageTypeId = jackrabbitDerbyPersistenceManager (
        storageTypeId = jackrabbitPostgreSQLPersistenceManager
        storageTypeId = jackrabbitMySqlPersistenceManager
        storageTypeId = jackrabbitBundleFsPersistenceManager
        storageTypeId = jackrabbitinMemPersistenceManager
        storageTypeId = jackrabbitSimpleDbPersistenceManager
        etc.

        Just ideas threw out at this stage, I feel it misses more thoughts and maybe mixed to aspects (type and persistence). Also I'm far to undersand all the JCR aspects. I will begin by reviewing ExampleJackrabbitShowContentData today...

        BTW maybe we/I should not continue to pollute Anne's topic and to create another Jira or rather continue to discuss this on dev ML?

        My 2cts

        Show
        Jacques Le Roux added a comment - To be fair, in my mind I was not referring to Anne's specific problem. But to her generic solution (Content.storageTypeId) mixed with a proposition based on Sascha's initial proposition of using a ContentFactory (he spoke about *ContentWorkerFactory). This to instantiate the respective "DataSource types". In other words we would use a ContentFactory Interface (or Abstract Class) to delegate the work to concrete classes DbContentFactory or JcrContentFactory, ... But indeed it's maybe easier to cut up the steam with a contentTypeId="JCR". I just wonder if it will not be to much confusing because it's no really a content type but a family of content types. Hence the Content.storageTypeId, where we could also decide of the JackRabbit Persistence Manager: storageTypeId = OFBizDB (standard) storageTypeId = jackrabbitDerbyPersistenceManager ( storageTypeId = jackrabbitPostgreSQLPersistenceManager storageTypeId = jackrabbitMySqlPersistenceManager storageTypeId = jackrabbitBundleFsPersistenceManager storageTypeId = jackrabbitinMemPersistenceManager storageTypeId = jackrabbitSimpleDbPersistenceManager etc. Just ideas threw out at this stage, I feel it misses more thoughts and maybe mixed to aspects (type and persistence). Also I'm far to undersand all the JCR aspects. I will begin by reviewing ExampleJackrabbitShowContentData today... BTW maybe we/I should not continue to pollute Anne's topic and to create another Jira or rather continue to discuss this on dev ML? My 2cts
        Hide
        Jacques Le Roux added a comment - - edited

        Mmm, it seems that in I'm putting the cart before the horse. PersistenceManager is internal to JackRabbit and part of the Workspace, much more to learn...

        Show
        Jacques Le Roux added a comment - - edited Mmm, it seems that in I'm putting the cart before the horse. PersistenceManager is internal to JackRabbit and part of the Workspace, much more to learn...
        Hide
        Anne Jessel added a comment -

        Thanks all for the excellent feedback.

        Like many of you, I also like to have little data in the entities, and most in Jackrabbit. I would prefer to ignore the existing DataResource for this.

        I don't like storing a document's expiry date as from/thruDate in ProductContent, because one document could be associated with multiple Product, Party, ShipmentItem etc entity values. The same from/thruDate would have to be copied to all of these. To me, the from/thru in something like a ProductContent entity states when the association between the content and the product is valid, not when the content itself is valid. The difference doesn't really matter if a specific content is always related to only one product.

        If Jackrabbit can efficiently and easily support searches such as "all documents of a certain type that have not expired" then I'd prefer to put the expiry date in Jackrabbit. But if the OOTB entity system does a better job (as I suspect), then I'd rather expiry be in an entity. Anyone know which is more efficient?

        I think Jacopo misunderstood what I said about contentTypeId having values such as ANNOTATION. I don't wish to add those. That is what is there now. I was trying to say that I think the existing use of contentTypeId is not compatible with indicating whether content is stored in JCR or Entity, therefore I suggest we add a new field for that purpose, namely storageTypeId.

        Adrian asked whether there is a design document somewhere. No there isn't (except my scratches on paper). Do you think I should add a page on the wiki or something? I don't mind where we discuss this: I started on the ML, and was advised to move it to Jira, but maybe it has evolved such that Jira is no longer appropriate. It is time I did a summary of my understanding of the current consensus anyway, so let me know where you all would prefer me to put it.

        Show
        Anne Jessel added a comment - Thanks all for the excellent feedback. Like many of you, I also like to have little data in the entities, and most in Jackrabbit. I would prefer to ignore the existing DataResource for this. I don't like storing a document's expiry date as from/thruDate in ProductContent, because one document could be associated with multiple Product, Party, ShipmentItem etc entity values. The same from/thruDate would have to be copied to all of these. To me, the from/thru in something like a ProductContent entity states when the association between the content and the product is valid, not when the content itself is valid. The difference doesn't really matter if a specific content is always related to only one product. If Jackrabbit can efficiently and easily support searches such as "all documents of a certain type that have not expired" then I'd prefer to put the expiry date in Jackrabbit. But if the OOTB entity system does a better job (as I suspect), then I'd rather expiry be in an entity. Anyone know which is more efficient? I think Jacopo misunderstood what I said about contentTypeId having values such as ANNOTATION. I don't wish to add those. That is what is there now. I was trying to say that I think the existing use of contentTypeId is not compatible with indicating whether content is stored in JCR or Entity, therefore I suggest we add a new field for that purpose, namely storageTypeId. Adrian asked whether there is a design document somewhere. No there isn't (except my scratches on paper). Do you think I should add a page on the wiki or something? I don't mind where we discuss this: I started on the ML, and was advised to move it to Jira, but maybe it has evolved such that Jira is no longer appropriate. It is time I did a summary of my understanding of the current consensus anyway, so let me know where you all would prefer me to put it.
        Hide
        Sascha Rodekamp added a comment -

        Good morning everybody

        first sorry for my late response, believe it or not i was offline the hole weekend .
        There is already a wiki page where i tried to document my current development state. I will reorganize it, that we can store additional conceptional documents. https://cwiki.apache.org/OFBIZ/jackrabbit-branch-development.html

        support searches such as "all documents of a certain type that have not expired"

        Jap we can do this. An example from the JCR Spec:

        A query can specify a constraint to filter the set of node-tuples by any
        combination of:

        • Value of a property, for example:
          • Nodes whose jcr:created property is after 2007-03-14T00:00:00.000Z
        • Existence of a property, for example:
          • Nodes with a jcr:language property

        What I don't know yet,if either DB-Queries or JCR-Queries have the better performance.

        I agree with Jacques and Anne if we extend the contentTypeId it is not obvious why we store the JCR indicator in this field. Otherwise i would let the repository handle the different content types (We can use properties in the nodes or create a certain content object mapping class).
        Imagine you have a third party CMS system which should connect to the repository to manage your contents, it's worse to implement a connector if you have to mix up the ofbiz DB and the repository, but it's striate forward if you only have the repository (assumed that the CMS uses JCR internally anyway).

        If you like to switch to the ML fell free, initially i didn't expect a longer discussion for this issue

        Show
        Sascha Rodekamp added a comment - Good morning everybody first sorry for my late response, believe it or not i was offline the hole weekend . There is already a wiki page where i tried to document my current development state. I will reorganize it, that we can store additional conceptional documents. https://cwiki.apache.org/OFBIZ/jackrabbit-branch-development.html support searches such as "all documents of a certain type that have not expired" Jap we can do this. An example from the JCR Spec: A query can specify a constraint to filter the set of node-tuples by any combination of: Value of a property, for example: Nodes whose jcr:created property is after 2007-03-14T00:00:00.000Z Existence of a property, for example: Nodes with a jcr:language property What I don't know yet,if either DB-Queries or JCR-Queries have the better performance. I agree with Jacques and Anne if we extend the contentTypeId it is not obvious why we store the JCR indicator in this field. Otherwise i would let the repository handle the different content types (We can use properties in the nodes or create a certain content object mapping class). Imagine you have a third party CMS system which should connect to the repository to manage your contents, it's worse to implement a connector if you have to mix up the ofbiz DB and the repository, but it's striate forward if you only have the repository (assumed that the CMS uses JCR internally anyway). If you like to switch to the ML fell free, initially i didn't expect a longer discussion for this issue
        Hide
        Jacopo Cappellato added a comment -

        Thank you Sascha.
        The architecture document is good but would I would like to see is a gap analysis between the features of the current Content framework in OFBiz that are currently used versus the new JCR mechanism.

        I am not asking you to document all of this, but I think it is important that we all discuss and understand all pros and cons.

        Main questions to answer are:

        • what are the mandatory features of the current Content and how will be implemented/migrated in the new implementation? Here a POC to show how different OFBiz content related setup will be in the new system would really help: for example how to associate localized content to a Product by purpose

        As soon as we will have a clear understanding of how, in the new framework, we will implement the features that we consider mandatory, it will be much easier to focus the effort in completing this work.

        Show
        Jacopo Cappellato added a comment - Thank you Sascha. The architecture document is good but would I would like to see is a gap analysis between the features of the current Content framework in OFBiz that are currently used versus the new JCR mechanism. I am not asking you to document all of this, but I think it is important that we all discuss and understand all pros and cons. Main questions to answer are: what are the mandatory features of the current Content and how will be implemented/migrated in the new implementation? Here a POC to show how different OFBiz content related setup will be in the new system would really help: for example how to associate localized content to a Product by purpose As soon as we will have a clear understanding of how, in the new framework, we will implement the features that we consider mandatory, it will be much easier to focus the effort in completing this work.
        Hide
        Sascha Rodekamp added a comment -

        Hi, jap that seems to be a good starting point. I will come up with a proposal in the next days. If anyone have time to publish a document earlier... feel free.

        BTW i restructured the wiki page so we should place all documents and design decisions there.

        Show
        Sascha Rodekamp added a comment - Hi, jap that seems to be a good starting point. I will come up with a proposal in the next days. If anyone have time to publish a document earlier... feel free. BTW i restructured the wiki page so we should place all documents and design decisions there.
        Hide
        Anne Jessel added a comment -

        Thanks for restructuring the wiki page, Sascha. It will be easier to expand on now.

        I planned to add some draft design ideas today, but instead I have come up with lots of questions. I realised I don't understand well enough how you intended the existing JCR classes to be used. I am adding here some of my thoughts: hopefully that will help make it all clearer to me and maybe others.

        I can think of two general use cases (ignoring standard CRUD-type operations):

        • I have already information that specifies which content I want. For example, I have a PartyContent or ProductContent entity, and want the associated content.
        • I need to choose specific content based on certain criteria. For example, I want to display a list of available and current Data Sheets to the user. When the user chooses one, I will link it to a Product by creating a ProductContent.

        Initially I had in mind a general workflow as follows:

        1. use current entity support to find desired Content entity (or maybe just contentId)
        2. pass chosen Content entity (or contentId) to a ContentFactory class method
        3. ContentFactory returns an object of an Interface type, with specific implementation determined by (at least) storageTypeId.
        4. code that invoked ContentFactory uses methods of Interface to access actual content and its metadata, and does not need to know whether the content and metadata is from other Entities, or from JCR

        If we do this, the design of the Interface returned by the ContentFactory will be very important.

        If the orm classes and Jackrabbit annotations are used, I'm not sure how best to make use of Content entity in a generic way. Maybe there needs to be a different orm.jackrabbit class, and corresponding api.*Helper, for each ContentType? And the ContentFactory uses the contentTypeId field to work out what class to instantiate (but only when storageTypeId is JCR).

        If we store searchable metadata such as from/thruDate and contentType in JCR, then maybe we can't always do workflow step 1 the way I was thinking. Maybe we need also a ContentWorker that will do searches for us, and automatically knows how to search both Entities and JCR repository?

        Show
        Anne Jessel added a comment - Thanks for restructuring the wiki page, Sascha. It will be easier to expand on now. I planned to add some draft design ideas today, but instead I have come up with lots of questions. I realised I don't understand well enough how you intended the existing JCR classes to be used. I am adding here some of my thoughts: hopefully that will help make it all clearer to me and maybe others. I can think of two general use cases (ignoring standard CRUD-type operations): I have already information that specifies which content I want. For example, I have a PartyContent or ProductContent entity, and want the associated content. I need to choose specific content based on certain criteria. For example, I want to display a list of available and current Data Sheets to the user. When the user chooses one, I will link it to a Product by creating a ProductContent. Initially I had in mind a general workflow as follows: use current entity support to find desired Content entity (or maybe just contentId) pass chosen Content entity (or contentId) to a ContentFactory class method ContentFactory returns an object of an Interface type, with specific implementation determined by (at least) storageTypeId. code that invoked ContentFactory uses methods of Interface to access actual content and its metadata, and does not need to know whether the content and metadata is from other Entities, or from JCR If we do this, the design of the Interface returned by the ContentFactory will be very important. If the orm classes and Jackrabbit annotations are used, I'm not sure how best to make use of Content entity in a generic way. Maybe there needs to be a different orm.jackrabbit class, and corresponding api.*Helper, for each ContentType? And the ContentFactory uses the contentTypeId field to work out what class to instantiate (but only when storageTypeId is JCR). If we store searchable metadata such as from/thruDate and contentType in JCR, then maybe we can't always do workflow step 1 the way I was thinking. Maybe we need also a ContentWorker that will do searches for us, and automatically knows how to search both Entities and JCR repository?

          People

          • Assignee:
            Sascha Rodekamp
            Reporter:
            Anne Jessel
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development