ManifoldCF
  1. ManifoldCF
  2. CONNECTORS-313

An example multi-process properties.xml delivered to the "dist" folder would be very helpful

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: ManifoldCF 0.4
    • Fix Version/s: ManifoldCF 0.4
    • Component/s: Build
    • Labels:
      None

      Description

      The multiprocess setup does not have an example properties.xml file. We should deliver one, in the right place so that all the scripts find it (the "dist" directory). It would also be helpful to deliver into this directory scripts for:

      • Registering all the connectors that were built
      • Starting the agents process
      1. CONNECTOR-313.patch
        2 kB
        Shinichiro Abe
      2. CONNECTOR-313.patch
        1 kB
        Shinichiro Abe
      3. register-draft.sh
        2 kB
        Shinichiro Abe

        Activity

        Hide
        Shinichiro Abe added a comment -

        This is a draft patch.

        Show
        Shinichiro Abe added a comment - This is a draft patch.
        Hide
        Karl Wright added a comment -

        Could you clarify why you want to do this? I don't see how this adds anything to the project other than confusion.

        In general, the "example" area is currently limited to single-process deployment and exactly what you need for that. The multiprocess deployment is extensively documented and basically involves all the other directories under "dist". So, to me, this conflates the two models completely, and also requires extensive rework of the documentation.

        Show
        Karl Wright added a comment - Could you clarify why you want to do this? I don't see how this adds anything to the project other than confusion. In general, the "example" area is currently limited to single-process deployment and exactly what you need for that. The multiprocess deployment is extensively documented and basically involves all the other directories under "dist". So, to me, this conflates the two models completely, and also requires extensive rework of the documentation.
        Hide
        Shinichiro Abe added a comment -

        Because a user always needs to move those directories if using multi-process. Now we have to read excutecommand.sh inside so that processes/ dir is placed into correct path for multi-process working. I want to do that automatically.

        Show
        Shinichiro Abe added a comment - Because a user always needs to move those directories if using multi-process. Now we have to read excutecommand.sh inside so that processes/ dir is placed into correct path for multi-process working. I want to do that automatically.
        Hide
        Karl Wright added a comment -

        The multiprocess directions in how-to-build-and-deploy do not involve moving any directories around, and I have not heard that those directions are incorrect. Can you give a detailed example of why you think the directions are wrong? Sorry, I still don't understand the reason.

        Show
        Karl Wright added a comment - The multiprocess directions in how-to-build-and-deploy do not involve moving any directories around, and I have not heard that those directions are incorrect. Can you give a detailed example of why you think the directions are wrong? Sorry, I still don't understand the reason.
        Hide
        Karl Wright added a comment -

        In any case, this is actually a major change, and in my opinion even if we decide to do it we definitely do not want to do it for ManifoldCF 0.4-incubating. As I pointed out lots of documentation would need changing, and given we are trying to close down the release within the next week this seems completely unjustified.

        Show
        Karl Wright added a comment - In any case, this is actually a major change, and in my opinion even if we decide to do it we definitely do not want to do it for ManifoldCF 0.4-incubating. As I pointed out lots of documentation would need changing, and given we are trying to close down the release within the next week this seems completely unjustified.
        Hide
        Karl Wright added a comment -

        So - responding more technically - MCF_HOME should point to the dist folder. This is where the properties.xml file must be for multiprocess execution. The properties.xml file under dist/example is not appropriate for multiprocess execution - they are different in several ways. First, the synchdir does not need to be set in dist/example. Second, the Quick Start has additional properties that are not used in the multiprocess.

        All of the scripts for the multiprocess execution should accept the same value for MCF_HOME; I worked hard to make that be consistent everywhere. The filenet and documentum sidecar processes use the same MCF_HOME convention, as does the "script-engine" and main "processes" folders. Moving these all under "example" seems to have no other possible use other than to share the properties.xml file, and as I've already explained, that cannot be shared anyway.

        Perhaps what you are really looking for is an example multiprocess properties.xml?

        Hope this clarifies a little.

        Show
        Karl Wright added a comment - So - responding more technically - MCF_HOME should point to the dist folder. This is where the properties.xml file must be for multiprocess execution. The properties.xml file under dist/example is not appropriate for multiprocess execution - they are different in several ways. First, the synchdir does not need to be set in dist/example. Second, the Quick Start has additional properties that are not used in the multiprocess. All of the scripts for the multiprocess execution should accept the same value for MCF_HOME; I worked hard to make that be consistent everywhere. The filenet and documentum sidecar processes use the same MCF_HOME convention, as does the "script-engine" and main "processes" folders. Moving these all under "example" seems to have no other possible use other than to share the properties.xml file, and as I've already explained, that cannot be shared anyway. Perhaps what you are really looking for is an example multiprocess properties.xml? Hope this clarifies a little.
        Hide
        Shinichiro Abe added a comment -

        Ok, I see. I understand we have to point MCF_HOME to dist/ folder when we use for multiprocess. Do we needed that properties.xml is placed to dist/ folder for out of the box multiprocess working? If not needed, I don't mind if this issue won't fix.

        Show
        Shinichiro Abe added a comment - Ok, I see. I understand we have to point MCF_HOME to dist/ folder when we use for multiprocess. Do we needed that properties.xml is placed to dist/ folder for out of the box multiprocess working? If not needed, I don't mind if this issue won't fix.
        Hide
        Karl Wright added a comment -

        Do we needed that properties.xml is placed to dist/ folder for out of the box multiprocess working?

        Yes. But it must be a multi-process properties.xml example. The one in dist/example is a single-process example.

        It sounds like we are in agreement. I can change this ticket to request delivery of a multi-process properties.xml in the dist/ folder. It may also be useful to deliver a connector registration script in the same place.

        Show
        Karl Wright added a comment - Do we needed that properties.xml is placed to dist/ folder for out of the box multiprocess working? Yes. But it must be a multi-process properties.xml example. The one in dist/example is a single-process example. It sounds like we are in agreement. I can change this ticket to request delivery of a multi-process properties.xml in the dist/ folder. It may also be useful to deliver a connector registration script in the same place.
        Hide
        Shinichiro Abe added a comment -

        properties.xml is placed at dist folder in this patch.

        Show
        Shinichiro Abe added a comment - properties.xml is placed at dist folder in this patch.
        Hide
        Shinichiro Abe added a comment -

        About register script, sorry, I don't know how to read from connectors.xml. How can this script read xml?

        Show
        Shinichiro Abe added a comment - About register script, sorry, I don't know how to read from connectors.xml. How can this script read xml?
        Hide
        Karl Wright added a comment - - edited

        Hi Abe-san,

        connectors.xml is read by the QuickStart java startup class only at this time. You have two choices:

        (1) Modify the main build.xml to dynamically build your register.sh script, in a manner similar to the way it dynamically builds connectors.xml;
        (2) Create a new command class in framework/pull-agent/src/main/java/org/apache/manifoldcf/crawler, which parses a connectors.xml and performs the corresponding modification (and nothing else), and then call that command class from your script. You may be able to move the code for this from framework/jettyrunner, and change the jettyrunner code so it is not duplicated.

        I'm not sure which is better. It depends on how people typically deploy the multiprocess version.

        Show
        Karl Wright added a comment - - edited Hi Abe-san, connectors.xml is read by the QuickStart java startup class only at this time. You have two choices: (1) Modify the main build.xml to dynamically build your register.sh script, in a manner similar to the way it dynamically builds connectors.xml; (2) Create a new command class in framework/pull-agent/src/main/java/org/apache/manifoldcf/crawler, which parses a connectors.xml and performs the corresponding modification (and nothing else), and then call that command class from your script. You may be able to move the code for this from framework/jettyrunner, and change the jettyrunner code so it is not duplicated. I'm not sure which is better. It depends on how people typically deploy the multiprocess version.
        Hide
        Karl Wright added a comment -

        Since this work is likely to need some refinement over time, I've created a branch to work on it in. Svn url:

        https://svn.apache.org/repos/asf/incubator/lcf/branches/CONNECTORS-313

        Please go ahead and commit all of your changes to it, and we will get it right there before we finally commit to trunk.

        Thanks!

        Show
        Karl Wright added a comment - Since this work is likely to need some refinement over time, I've created a branch to work on it in. Svn url: https://svn.apache.org/repos/asf/incubator/lcf/branches/CONNECTORS-313 Please go ahead and commit all of your changes to it, and we will get it right there before we finally commit to trunk. Thanks!
        Hide
        Shinichiro Abe added a comment -

        branches/CONNECTORS-313: Committed revision 1214014.

        Show
        Shinichiro Abe added a comment - branches/ CONNECTORS-313 : Committed revision 1214014.
        Hide
        Shinichiro Abe added a comment -

        I Created a new command class: ./executecommand.sh org.apache.manifoldcf.crawler.RegisterAll.
        And I checked single/multi processes both. Please review me. Thank you.

        Show
        Shinichiro Abe added a comment - I Created a new command class: ./executecommand.sh org.apache.manifoldcf.crawler.RegisterAll. And I checked single/multi processes both. Please review me. Thank you.
        Hide
        Karl Wright added a comment -

        Hi Abe-san,

        This looks good. With your permission, I'd like to explore further rearrangement of the dist tree to more clearly separate single process from multi process. Is this OK with you?

        Show
        Karl Wright added a comment - Hi Abe-san, This looks good. With your permission, I'd like to explore further rearrangement of the dist tree to more clearly separate single process from multi process. Is this OK with you?
        Hide
        Shinichiro Abe added a comment -

        Hi Mr.Karl, I'm ok your futher changes. I will test its rearrangement.

        Show
        Shinichiro Abe added a comment - Hi Mr.Karl, I'm ok your futher changes. I will test its rearrangement.
        Hide
        Karl Wright added a comment -

        r1214041. The rearrangement I just did works as follows:

        (1) There's a separately checked-in properties.xml for each of the models
        (2) The organization of the "dist" area has changed. "dist/example" is the single-process area, "dist/multi" is the multiprocess area, and in "dist" itself there is a common connectors.xml and connector-lib area.

        What we're still missing is the necessary documentation changes (in how-to-build-and-deploy.xml), and documentation for the new command classes.

        Show
        Karl Wright added a comment - r1214041. The rearrangement I just did works as follows: (1) There's a separately checked-in properties.xml for each of the models (2) The organization of the "dist" area has changed. "dist/example" is the single-process area, "dist/multi" is the multiprocess area, and in "dist" itself there is a common connectors.xml and connector-lib area. What we're still missing is the necessary documentation changes (in how-to-build-and-deploy.xml), and documentation for the new command classes.
        Hide
        Shinichiro Abe added a comment -

        r1214051. There is a incorrect mkdir(dist/example/connector-lib at framework/build.xml), so remove it.

        Show
        Shinichiro Abe added a comment - r1214051. There is a incorrect mkdir(dist/example/connector-lib at framework/build.xml), so remove it.
        Hide
        Karl Wright added a comment - - edited

        Do you think we should also have a script delivered in dist/multi that does all of the setup, such as setting MCF_HOME and calling all the right commands to initialize the database? For MCF_HOME, it could check whether './properties.xml' existed, and if it did set MCF_HOME to the value of '.', for instance. It should also optionally accept the database superuser name and superuser password (just like the DBCreate command), and perform the following command steps:

        • DBCreate
        • Initialize
        • RegisterAgent
        • RegisterAll

        The script can be located in: framework/example-multiprocess. What do you think?

        By the way, I also noticed that one of the Register commands you checked in is misspelled: Regsiter

        Show
        Karl Wright added a comment - - edited Do you think we should also have a script delivered in dist/multi that does all of the setup, such as setting MCF_HOME and calling all the right commands to initialize the database? For MCF_HOME, it could check whether './properties.xml' existed, and if it did set MCF_HOME to the value of '.', for instance. It should also optionally accept the database superuser name and superuser password (just like the DBCreate command), and perform the following command steps: DBCreate Initialize RegisterAgent RegisterAll The script can be located in: framework/example-multiprocess. What do you think? By the way, I also noticed that one of the Register commands you checked in is misspelled: Regsiter
        Hide
        Karl Wright added a comment -

        I changed "multi" to "multiprocess-example" for better clarity. Makes the doc easier. r1214116.

        Show
        Karl Wright added a comment - I changed "multi" to "multiprocess-example" for better clarity. Makes the doc easier. r1214116.
        Hide
        Shinichiro Abe added a comment -

        I agree placing a script in dist/multi. But Do we need four commands? I think one command RegisterAll is only needed. RegisterAll perform dbcreate and register output/auth/repo connectors.And my question, what does Initialize command do? I'm sorry I misspelled Regsiter.java. I'll modify later.

        Show
        Shinichiro Abe added a comment - I agree placing a script in dist/multi. But Do we need four commands? I think one command RegisterAll is only needed. RegisterAll perform dbcreate and register output/auth/repo connectors.And my question, what does Initialize command do? I'm sorry I misspelled Regsiter.java. I'll modify later.
        Hide
        Karl Wright added a comment -

        ok I did not realize that. Maybe we change the name of the command to InitializeAndRegister then?

        Show
        Karl Wright added a comment - ok I did not realize that. Maybe we change the name of the command to InitializeAndRegister then?
        Hide
        Karl Wright added a comment -

        r1214242 to add support scripts for starting and stopping agents process

        Show
        Karl Wright added a comment - r1214242 to add support scripts for starting and stopping agents process
        Hide
        Karl Wright added a comment -

        r1214244 to add example lock-clean scripts too

        Show
        Karl Wright added a comment - r1214244 to add example lock-clean scripts too
        Hide
        Karl Wright added a comment -

        The changes included in this ticket reduce the overall size of our binary distribution significantly. This is a good reason to consider including it in the 0.4-incubating release, since it will offset a lot of the growth that has taken place as a result of the alfresco connector being added.

        Show
        Karl Wright added a comment - The changes included in this ticket reduce the overall size of our binary distribution significantly. This is a good reason to consider including it in the 0.4-incubating release, since it will offset a lot of the growth that has taken place as a result of the alfresco connector being added.
        Hide
        Karl Wright added a comment -

        I also have noticed that RegisterAll reads database superuser credentials from the properties file. This is OK for the single-process example because the database is usually embedded anyhow, but for the multiprocess version embedding the credentials seems more risky, so we will need to address this, I think.

        There are two possible solutions I can think of here. The first possibility is to store the password in an obfuscated form. We already have some obfuscation/deobfuscation code that can take care of this. The second possibility is to accept the superuser name and password as command-line arguments to the RegisterAll command (or InitializeAndRegister, if we change it).

        If we adopt the first approach, then I think we would definitely want to share the code with the single-process example. In order to maintain backwards compatibility, I'd introduce a new parameter: org.apache.manifoldcf.dbsuperuserpasswordobfuscated. Use the unobfuscated password only if the obfuscated one is not found. We'll also need a script to do the obfuscation, but I'm happy to write that.

        The second approach puts the onus on the user for maintaining the security of their passwords, but it only applies to the multiprocess example at the moment. But we could modify the single-process example to optionally take these credentials from the command line also.

        Thoughts?

        Show
        Karl Wright added a comment - I also have noticed that RegisterAll reads database superuser credentials from the properties file. This is OK for the single-process example because the database is usually embedded anyhow, but for the multiprocess version embedding the credentials seems more risky, so we will need to address this, I think. There are two possible solutions I can think of here. The first possibility is to store the password in an obfuscated form. We already have some obfuscation/deobfuscation code that can take care of this. The second possibility is to accept the superuser name and password as command-line arguments to the RegisterAll command (or InitializeAndRegister, if we change it). If we adopt the first approach, then I think we would definitely want to share the code with the single-process example. In order to maintain backwards compatibility, I'd introduce a new parameter: org.apache.manifoldcf.dbsuperuserpasswordobfuscated. Use the unobfuscated password only if the obfuscated one is not found. We'll also need a script to do the obfuscation, but I'm happy to write that. The second approach puts the onus on the user for maintaining the security of their passwords, but it only applies to the multiprocess example at the moment. But we could modify the single-process example to optionally take these credentials from the command line also. Thoughts?
        Hide
        Karl Wright added a comment -

        I think we are almost done with this, so let's try to pull it into 0.4-incubating.

        Show
        Karl Wright added a comment - I think we are almost done with this, so let's try to pull it into 0.4-incubating.
        Hide
        Shinichiro Abe added a comment -

        Please wait. RegisterAll command doesn't work from r1214041. it creates db but doesn't insert records to connectors table. Now I don't know the cause. (I will change its name into InitializeAndRegister.)

        Show
        Shinichiro Abe added a comment - Please wait. RegisterAll command doesn't work from r1214041. it creates db but doesn't insert records to connectors table. Now I don't know the cause. (I will change its name into InitializeAndRegister.)
        Hide
        Karl Wright added a comment -

        I will wait until we have a working InitializeAndRegister command, and also a corresponding script, before I pull into trunk and the release branch.

        Show
        Karl Wright added a comment - I will wait until we have a working InitializeAndRegister command, and also a corresponding script, before I pull into trunk and the release branch.
        Hide
        Karl Wright added a comment - - edited

        I think I see the problem. The class extends TransactionalCrawlerInitializationCommand, which presumes that the database instance already exists and puts everything in a single transaction. I don't think you'll be able to extend that class if you are attempting database creation.

        I don't think it is necessary to do everything in one class anyway. The only new command functionality is the part that reads connectors.xml and registers the connectors. If you create a command that does just that, and a script that calls DBCreate, Initialize, RegisterAgent, and that, I think that would be fine.

        Show
        Karl Wright added a comment - - edited I think I see the problem. The class extends TransactionalCrawlerInitializationCommand, which presumes that the database instance already exists and puts everything in a single transaction. I don't think you'll be able to extend that class if you are attempting database creation. I don't think it is necessary to do everything in one class anyway. The only new command functionality is the part that reads connectors.xml and registers the connectors. If you create a command that does just that, and a script that calls DBCreate, Initialize, RegisterAgent, and that, I think that would be fine.
        Hide
        Shinichiro Abe added a comment -

        I found the cause RegisterAll command doesn't work.
        example-multiprocess properties.xml doesn't have the following:

         
          <!-- Specify the connectors to be loaded -->
          <property name="org.apache.manifoldcf.connectorsconfigurationfile" value="../connectors.xml"/>
        

        I'll commit soon.

        Show
        Shinichiro Abe added a comment - I found the cause RegisterAll command doesn't work. example-multiprocess properties.xml doesn't have the following: <!-- Specify the connectors to be loaded --> <property name= "org.apache.manifoldcf.connectorsconfigurationfile" value= "../connectors.xml" /> I'll commit soon.
        Hide
        Shinichiro Abe added a comment -

        r1214606 for example-multiprocess properties.xml

        Show
        Shinichiro Abe added a comment - r1214606 for example-multiprocess properties.xml
        Hide
        Karl Wright added a comment -

        Next problem is that embedded Derby does not work in a multiprocess environment. So I changed multiprocess properties.xml to use postgresql by default. We could instead use hsqldb, and supply a script to start the server, which would make this all work out of the box; I'll look into that soon.

        Show
        Karl Wright added a comment - Next problem is that embedded Derby does not work in a multiprocess environment. So I changed multiprocess properties.xml to use postgresql by default. We could instead use hsqldb, and supply a script to start the server, which would make this all work out of the box; I'll look into that soon.
        Hide
        Karl Wright added a comment - - edited

        Oh, noticed one other thing. There's no un-registration of connectors. This is necessary because otherwise it is impossible to remove a connector using just RegisterAll. If you look at what the single process example does, it first deregisters all connectors before registering the new ones, for this reason.

        You may think that this is never going to happen with RegisterAll, but actually RegisterAll will be called not just for fresh installs but also for upgrades.

        Show
        Karl Wright added a comment - - edited Oh, noticed one other thing. There's no un-registration of connectors. This is necessary because otherwise it is impossible to remove a connector using just RegisterAll. If you look at what the single process example does, it first deregisters all connectors before registering the new ones, for this reason. You may think that this is never going to happen with RegisterAll, but actually RegisterAll will be called not just for fresh installs but also for upgrades.
        Hide
        Shinichiro Abe added a comment -

        r1214610 for replace misspelled RegsiterConnectors.java with RegisterConnectors.java

        Show
        Shinichiro Abe added a comment - r1214610 for replace misspelled RegsiterConnectors.java with RegisterConnectors.java
        Hide
        Karl Wright added a comment -

        I checked in a bunch of new scripts. I've also switched over to using HSQLDB in its multiprocess mode. In doing so I found another bug - logged as CONNECTORS-320.

        Show
        Karl Wright added a comment - I checked in a bunch of new scripts. I've also switched over to using HSQLDB in its multiprocess mode. In doing so I found another bug - logged as CONNECTORS-320 .
        Hide
        Karl Wright added a comment -

        r1214850 (trunk)
        r1214851 (release branch)

        Show
        Karl Wright added a comment - r1214850 (trunk) r1214851 (release branch)

          People

          • Assignee:
            Shinichiro Abe
            Reporter:
            Shinichiro Abe
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development