Details

      Description

      This is an umbrella task to groups all issues related to the synchronization process.
      A subtask will be created for each identified issue.

      I selected framework and POS as affected components, because for the moment the synchronization process is mostly used to sync POS terminals. But it could be used for other needs.

      1. TRANSLOG.jpg
        322 kB
        Jacques Le Roux
      2. TRANSLOG_EXEC_LOG.jpg
        61 kB
        Jacques Le Roux

        Activity

        Show
        Jacques Le Roux added a comment - A recent related thread on OFBiz user ML
        Show
        Jacques Le Roux added a comment - Another interesting recent thread, looks like I'm not alone
        Hide
        Scott Gray added a comment -

        Downgrading priority, I see no need why this should be flagged as Critical.

        Show
        Scott Gray added a comment - Downgrading priority, I see no need why this should be flagged as Critical.
        Hide
        Jacques Le Roux added a comment -

        Yes you are right, Scott. I hope to find some time to work seriously on this in... 2010...

        Show
        Jacques Le Roux added a comment - Yes you are right, Scott. I hope to find some time to work seriously on this in... 2010...
        Hide
        Hans Bakker added a comment -

        We have a customer using entity sync and experiencing the problems in this issue.

        We are interested working with other parties on this subject. If there is already substantial work done we are interested to compensate financially for it.

        Please let me know if there is any progress here.

        Regards,
        Hans

        Show
        Hans Bakker added a comment - We have a customer using entity sync and experiencing the problems in this issue. We are interested working with other parties on this subject. If there is already substantial work done we are interested to compensate financially for it. Please let me know if there is any progress here. Regards, Hans
        Hide
        Jacques Le Roux added a comment -

        Hi Hans,

        What kind of problems is your client experiencing?

        Show
        Jacques Le Roux added a comment - Hi Hans, What kind of problems is your client experiencing?
        Hide
        Jacques Le Roux added a comment -

        Hans,

        First some things are worth to be said, in the following

        • it leans on a simple POS/MCS configuration (or client/server which is actually the same)
        • the jobs (push and pull) are scheduled and running on the client only

        There are actually several knows issues:

        1. Syncing stay in running status though the related job "crashed". For instance if an issue occurs (eg, in order of possible apparition: net crash, power outage on the POS, or server crash for any reason, etc.). This is a small issue, and I'm not even quite sure it's a design flaw, but it's rather really designed this way (to keep things under control). Note that I did not design nor wrote the EntitySunc stuff, so it's only an hypohesis.
          • I still wonder if we should really automate the clearing of the running state (put the status to not startged) when an issue occured, like Pankaj Jain suggested.
            • This because there is already the "Reset run status" from webtools -> entity sync status for doing it manyually, so and I don't see the point to reboot the client for that. And I prefer to keep a hand on that.
            • Else if really needed we could use the underlying resetEntitySyncStatusToNotStarted service. But we should really take care of not firing several push or pull jobs. Because by default the jobs are scheduled to run respectively every 5 minutes and each hour. I suspect the other issues Pankaj reported ("An instance is already running", "Connection refused") could be related to this... Also not sure, but it seems it's more a problem on the client side than for the server, ie more a problem for the push job.
        2. Some entities miss in the pull syncing. According to Deyan Tsvetanov, this might also appear during an interruption. This would be really more annoying and Deyan makes good propositions. But it seems Pankaj never crossed this issue and I did not neither so far: to be clarified.
        3. RMI is slow. I think everybody agree on that, but this clearly needs more work
        4. Pankaj crossed an issue with EECAs. IMO EECAs are delicate beasts and should be carefully used, but it might be an issue indeed. The idea is to avoid having duplicated creation on the other end if an EECA is already creating a new entity on one end. So the EECA should be only on one end.

        This is a 1st effort trying to clarify the situation. I hope we can all cooperate on this and end with a common and shared solution.

        Show
        Jacques Le Roux added a comment - Hans, First some things are worth to be said, in the following it leans on a simple POS/MCS configuration (or client/server which is actually the same) the jobs (push and pull) are scheduled and running on the client only There are actually several knows issues: Syncing stay in running status though the related job "crashed". For instance if an issue occurs (eg, in order of possible apparition: net crash, power outage on the POS, or server crash for any reason, etc.). This is a small issue, and I'm not even quite sure it's a design flaw, but it's rather really designed this way (to keep things under control). Note that I did not design nor wrote the EntitySunc stuff, so it's only an hypohesis. I still wonder if we should really automate the clearing of the running state (put the status to not startged) when an issue occured, like Pankaj Jain suggested. This because there is already the "Reset run status" from webtools -> entity sync status for doing it manyually, so and I don't see the point to reboot the client for that. And I prefer to keep a hand on that. Else if really needed we could use the underlying resetEntitySyncStatusToNotStarted service. But we should really take care of not firing several push or pull jobs. Because by default the jobs are scheduled to run respectively every 5 minutes and each hour. I suspect the other issues Pankaj reported ("An instance is already running", "Connection refused") could be related to this... Also not sure, but it seems it's more a problem on the client side than for the server, ie more a problem for the push job. Some entities miss in the pull syncing. According to Deyan Tsvetanov, this might also appear during an interruption. This would be really more annoying and Deyan makes good propositions. But it seems Pankaj never crossed this issue and I did not neither so far: to be clarified. RMI is slow. I think everybody agree on that, but this clearly needs more work Pankaj crossed an issue with EECAs. IMO EECAs are delicate beasts and should be carefully used, but it might be an issue indeed. The idea is to avoid having duplicated creation on the other end if an EECA is already creating a new entity on one end. So the EECA should be only on one end. This is a 1st effort trying to clarify the situation. I hope we can all cooperate on this and end with a common and shared solution.
        Hide
        Jacques Le Roux added a comment - - edited

        Here is a quote from a Deyan's message he sent us personnally

        There are some major issues that CAN be fixed in the current implementation: error recovery - due to interrupted connection between the two servers, sync speed - RMI can be removed , etc.

        There is however a major issue that CAN NOT be fixed in the current implementation: the list and sequence of entities to be synchronized gets created by entities' timestamp - date_created_tx and last_update_tx. It works as long as the clocks of all the syncing parties are in sync. You can easily achieve this by using NTP for example - reliable enough. But if the clock of one of the parties gets un-synced for just few minutes, and during those few minutes records get inserted or updated than you are in trouble. Syncing the clock back won't help you because you won't be able to sync the broken records due to foreign key constraint issues. Examples I could give but I guess you could think of such by yourselves.

        So IMHO the best approach for synchronization is not the timestamp but the TRANSACTION LOG. This approach is used in all major databases - m$ $ql, oracle.

        For a customer I've implemented a transaction log based on triggers and stored procedures. The transaction log, triggers and the stored procedures however I implemented only postgresql as that was the customer's database. It's easy to implement ms sql or oracle version though. It works perfectly, much much much faster than RMI, recovers if the sync process is interrupted , etc. My goal was to implement this mechanism using entity engine level triggers and eventually commit it, but unfortunately still pretty busy with other things so we don't have resources that can be dedicated to work on that task at the moment, we're trying to work out the world financial crisis.

        So if you find what i say reasonable you could go ahead with the triggers and SPs. For that you need database that supports triggers - so mysql won't work.

        That was just the first part.

        The second part is to identify all the tables that you need to synchronize. Some of them will be only pulled, some of them pushed only and some of them synced in both directions. Next you need to test, reset the database and test again and again until you identify the correct list of the tables so your sync process doesn't end up with FK insert / update errors. That is pretty easy but time consuming task - it takes few days to complete.

        So that's all I can say for now, without getting your bored with details

        Show
        Jacques Le Roux added a comment - - edited Here is a quote from a Deyan's message he sent us personnally There are some major issues that CAN be fixed in the current implementation: error recovery - due to interrupted connection between the two servers, sync speed - RMI can be removed , etc. There is however a major issue that CAN NOT be fixed in the current implementation: the list and sequence of entities to be synchronized gets created by entities' timestamp - date_created_tx and last_update_tx. It works as long as the clocks of all the syncing parties are in sync. You can easily achieve this by using NTP for example - reliable enough. But if the clock of one of the parties gets un-synced for just few minutes, and during those few minutes records get inserted or updated than you are in trouble. Syncing the clock back won't help you because you won't be able to sync the broken records due to foreign key constraint issues. Examples I could give but I guess you could think of such by yourselves. So IMHO the best approach for synchronization is not the timestamp but the TRANSACTION LOG. This approach is used in all major databases - m$ $ql, oracle. For a customer I've implemented a transaction log based on triggers and stored procedures. The transaction log, triggers and the stored procedures however I implemented only postgresql as that was the customer's database. It's easy to implement ms sql or oracle version though. It works perfectly, much much much faster than RMI, recovers if the sync process is interrupted , etc. My goal was to implement this mechanism using entity engine level triggers and eventually commit it, but unfortunately still pretty busy with other things so we don't have resources that can be dedicated to work on that task at the moment, we're trying to work out the world financial crisis. So if you find what i say reasonable you could go ahead with the triggers and SPs. For that you need database that supports triggers - so mysql won't work. That was just the first part. The second part is to identify all the tables that you need to synchronize. Some of them will be only pulled, some of them pushed only and some of them synced in both directions. Next you need to test, reset the database and test again and again until you identify the correct list of the tables so your sync process doesn't end up with FK insert / update errors. That is pretty easy but time consuming task - it takes few days to complete. So that's all I can say for now, without getting your bored with details
        Hide
        Jacques Le Roux added a comment -

        Here is a quote from a Deyan's message he sent us personnally

        So about the transaction log:

        I have a table called TRANSLOG with the following structure:

        CREATE TABLE translog
        (
        id bigserial NOT NULL,
        "table" character varying(30) NOT NULL,
        operation character varying(6) NOT NULL,
        "statement" character varying NOT NULL,
        created_on timestamp with time zone NOT NULL DEFAULT now(),
        CONSTRAINT id_fk PRIMARY KEY (id)
        )

        Records in the TRANSLOG table are being inserted by the triggers for the table we are "watching" for changes on INSERT, UPDATE and DELETE.
        The STATEMENT column contains executable SQL statement which is generated by the trigger function. The following screenshot shows some data in the TRANSLOG table: (see attachments)

        So every time some database action is performed on the tables we are "watching" we get a TRANSLOG entry.

        Each translog entry has a sequence ID. So when we want to PUSH that transaction log to the main server we actually execute the statements ON the main server. We execute the statements wrapped in a transaction which gets rolled back if an error has occured.

        The translog push gets invoked by a stored procedure which gets called by a scheduled reoccuring job in ofbiz.
        For each sync SP execution we store a record in a table called TRANSLOG_EXEC_LOG . There we keep the currently reached translog_id so we know from where to start the next time
        when the sync SP is invoked.

        So in general that's the mechanism. Pretty simple to implement, pretty simple to troubleshoot and pretty error resistant . Here's the screenshot of TRANSLOG_EXEC_LOG: (see attachments)

        The result message could be either OK or the DB error message - either a constraint error or some other error. We always know which was the last successfully executed statement (
        last_command_id ). If there is an error then we know at which statement the error has happened - the next after the last successful one.

        That's all in general.

        Show
        Jacques Le Roux added a comment - Here is a quote from a Deyan's message he sent us personnally So about the transaction log: I have a table called TRANSLOG with the following structure: CREATE TABLE translog ( id bigserial NOT NULL, "table" character varying(30) NOT NULL, operation character varying(6) NOT NULL, "statement" character varying NOT NULL, created_on timestamp with time zone NOT NULL DEFAULT now(), CONSTRAINT id_fk PRIMARY KEY (id) ) Records in the TRANSLOG table are being inserted by the triggers for the table we are "watching" for changes on INSERT, UPDATE and DELETE. The STATEMENT column contains executable SQL statement which is generated by the trigger function. The following screenshot shows some data in the TRANSLOG table: (see attachments) So every time some database action is performed on the tables we are "watching" we get a TRANSLOG entry. Each translog entry has a sequence ID. So when we want to PUSH that transaction log to the main server we actually execute the statements ON the main server. We execute the statements wrapped in a transaction which gets rolled back if an error has occured. The translog push gets invoked by a stored procedure which gets called by a scheduled reoccuring job in ofbiz. For each sync SP execution we store a record in a table called TRANSLOG_EXEC_LOG . There we keep the currently reached translog_id so we know from where to start the next time when the sync SP is invoked. So in general that's the mechanism. Pretty simple to implement, pretty simple to troubleshoot and pretty error resistant . Here's the screenshot of TRANSLOG_EXEC_LOG: (see attachments) The result message could be either OK or the DB error message - either a constraint error or some other error. We always know which was the last successfully executed statement ( last_command_id ). If there is an error then we know at which statement the error has happened - the next after the last successful one. That's all in general.
        Hide
        Jacques Le Roux added a comment -

        I think now that having a link to the thread is maybe easier (at least for me, though not a solution for screenshots) here it is. There are some interesting answers from David recently...

        Show
        Jacques Le Roux added a comment - I think now that having a link to the thread is maybe easier (at least for me, though not a solution for screenshots) here it is . There are some interesting answers from David recently...
        Hide
        Jacques Le Roux added a comment -

        I think that nobody wants to go further on this. So I close this issue after having updated the related page in wiki from our discussions.

        Thanks

        Show
        Jacques Le Roux added a comment - I think that nobody wants to go further on this. So I close this issue after having updated the related page in wiki from our discussions. Thanks

          People

          • Assignee:
            Jacques Le Roux
            Reporter:
            Jacques Le Roux
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development