Apache S4
  1. Apache S4
  2. S4-59

Resource loading from the S4 node classpath

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.6
    • Labels:
      None

      Description

      There should be a way to add custom files to the application's Classpath. This is useful to configure the logging backend, for example.

        Activity

        Hide
        Matthieu Morel added a comment -

        Merged in dev branch, commit [051082e]

        Still need to update the docs, but main differences from a user point of view are:

        • zookeeper test instance data is not cleaned by default, we need to explicitly add the -clean option:
          ./s4 zkServer -clean
        • we start S4 nodes with a minimum module, for bootstrapping. (See BaseModule)
          e.g.
          ./s4 node -c=clusterA -zk=host:port
        • platform and app config are specified through the deploy command:
          • e.g.
            ./s4 deploy -s4r=file://`pwd`/build/libs/toto.s4r -c=cluster1 -appName=myApp
          • with custom modules and parameters:
            ./s4 deploy -s4r=`pwd`/test-apps/twitter-counter/build/libs/twitter-counter.s4r -c=cluster1 -appName=twitter-counter -p=s4.checkpointing.filesystem.storageRootPath=/tmp/toto -emc=org.apache.s4.core.ft.FileSystemBackendCheckpointingModule	
          • custom modules may also be fetched through a given URI, see options "-modulesURIs", "-mu"
        • for starting an adapter:
          • way 1: as an app with custom output stream:
            • ./s4 deploy -appName=twitter-adapter -c=cluster2 -b=`pwd`/test-apps/twitter-adapter/build.gradle -p=s4.adapter.output.stream=RawStatus
            • then simply starting a node in that cluster
              ./s4 node -c=cluster2 
          • way 2: for testing, without deployment, by specifying the appclass and automatically resolving the classpath of the project:
            • specify the appClass:
              ./s4 deploy -appClass=hello.HelloInputAdapter -p=s4.adapter.output.stream=names -c=cluster2 -appName=adapter
            • start as an adapter from the project root path
              ./s4 adapter -c=cluster2
        Show
        Matthieu Morel added a comment - Merged in dev branch, commit [051082e] Still need to update the docs, but main differences from a user point of view are: zookeeper test instance data is not cleaned by default, we need to explicitly add the -clean option: ./s4 zkServer -clean we start S4 nodes with a minimum module, for bootstrapping. (See BaseModule) e.g. ./s4 node -c=clusterA -zk=host:port platform and app config are specified through the deploy command: e.g. ./s4 deploy -s4r=file: //`pwd`/build/libs/toto.s4r -c=cluster1 -appName=myApp with custom modules and parameters: ./s4 deploy -s4r=`pwd`/test-apps/twitter-counter/build/libs/twitter-counter.s4r -c=cluster1 -appName=twitter-counter -p=s4.checkpointing.filesystem.storageRootPath=/tmp/toto -emc=org.apache.s4.core.ft.FileSystemBackendCheckpointingModule custom modules may also be fetched through a given URI, see options "-modulesURIs", "-mu" for starting an adapter: way 1: as an app with custom output stream: ./s4 deploy -appName=twitter-adapter -c=cluster2 -b=`pwd`/test-apps/twitter-adapter/build.gradle -p=s4.adapter.output.stream=RawStatus then simply starting a node in that cluster ./s4 node -c=cluster2 way 2: for testing, without deployment, by specifying the appclass and automatically resolving the classpath of the project: specify the appClass: ./s4 deploy -appClass=hello.HelloInputAdapter -p=s4.adapter.output.stream=names -c=cluster2 -appName=adapter start as an adapter from the project root path ./s4 adapter -c=cluster2
        Hide
        Daniel Gómez Ferro added a comment -

        Wonderful work Matthieu, thank you! Defining custom modules is now much simpler.

        +1

        Show
        Daniel Gómez Ferro added a comment - Wonderful work Matthieu, thank you! Defining custom modules is now much simpler. +1
        Hide
        Matthieu Morel added a comment - - edited

        Thanks for the new comments Daniel. I took your comments into account and updated the tools with the new configuration model. Will document on the wiki, but the general idea is that, even for the adapter tool, we first upload the app and platform config in Zookeeper, then we simply start nodes.

        Updated patch available in branch S4-59 commit [7548ba5] [1aa1e77] (with a fix for default -clean arg value for zkServer command)

        Show
        Matthieu Morel added a comment - - edited Thanks for the new comments Daniel. I took your comments into account and updated the tools with the new configuration model. Will document on the wiki, but the general idea is that, even for the adapter tool, we first upload the app and platform config in Zookeeper, then we simply start nodes. Updated patch available in branch S4-59 commit [7548ba5] [1aa1e77] (with a fix for default -clean arg value for zkServer command)
        Hide
        Daniel Gómez Ferro added a comment -

        I have some more comments:

        1. The ZkServer.clean flag has been changed to false by default, which is less comfortable for testing purposes. We could change the flag to -dontClean or similar, but I don't have a strong preference.
        2. The field which stored the app name in ZK used to be called just "name". AppConfig.APP_NAME is "appName" instead. There are several places that use the old field name directly, should be changed to use AppConfig.APP_NAME.
          • subprojects/s4-core/src/main/java/org/apache/s4/core/S4Bootstrap.java: String appName = appData.getSimpleField("name");
          • subprojects/s4-tools/src/main/java/org/apache/s4/tools/Status.java: return appRecord.getSimpleField("name");
          • subprojects/s4-tools/src/main/java/org/apache/s4/tools/Status.java: app.name = appRecord.getSimpleField("name");
        3. org.apache.s4.core.Main has been removed but it's referenced in a couple of places
          • subprojects/s4-tools/src/main/resources/templates/s4: java -cp `cat classpath.txt` org.apache.s4.core.Main $@
          • test-apps/twitter-adapter/build.gradle:mainClassName = "org.apache.s4.core.Main"
        Show
        Daniel Gómez Ferro added a comment - I have some more comments: The ZkServer.clean flag has been changed to false by default, which is less comfortable for testing purposes. We could change the flag to -dontClean or similar, but I don't have a strong preference. The field which stored the app name in ZK used to be called just "name". AppConfig.APP_NAME is "appName" instead. There are several places that use the old field name directly, should be changed to use AppConfig.APP_NAME. subprojects/s4-core/src/main/java/org/apache/s4/core/S4Bootstrap.java: String appName = appData.getSimpleField("name"); subprojects/s4-tools/src/main/java/org/apache/s4/tools/Status.java: return appRecord.getSimpleField("name"); subprojects/s4-tools/src/main/java/org/apache/s4/tools/Status.java: app.name = appRecord.getSimpleField("name"); org.apache.s4.core.Main has been removed but it's referenced in a couple of places subprojects/s4-tools/src/main/resources/templates/s4: java -cp `cat classpath.txt` org.apache.s4.core.Main $@ test-apps/twitter-adapter/build.gradle:mainClassName = "org.apache.s4.core.Main"
        Hide
        Matthieu Morel added a comment -

        Uploaded a revised patch in branch S4-59 commit [a941dfd]

        Show
        Matthieu Morel added a comment - Uploaded a revised patch in branch S4-59 commit [a941dfd]
        Hide
        Matthieu Morel added a comment - - edited

        Thanks Daniel for the feedback.

        The first point is obviously something that should not be commented, and I'll fix that.

        Interestingly, I can't reproduce the failure for the remote loader (note that you need to compile and install the S4 artifacts locally first). Nevertheless I agree it can be better designed and the 2 tests properly isolated. So I'll fix that as well.

        Show
        Matthieu Morel added a comment - - edited Thanks Daniel for the feedback. The first point is obviously something that should not be commented, and I'll fix that. Interestingly, I can't reproduce the failure for the remote loader (note that you need to compile and install the S4 artifacts locally first). Nevertheless I agree it can be better designed and the 2 tests properly isolated. So I'll fix that as well.
        Hide
        Daniel Gómez Ferro added a comment - - edited
        
        @@ -69,7 +69,7 @@ public class S4RLoaderFactory {
                 File s4rDir = null;
                 if (tmpDir == null) {
                     s4rDir = Files.createTempDir();
        -            s4rDir.deleteOnExit();
        +            // s4rDir.deleteOnExit();
        

        Did you forget to uncomment this?

        TestModuleLoaderRemote is failing here. It tries to run testLocal() which is inherited from TestModuleLoader, and this test hangs. As it is, testLocal() runs twice.

        Show
        Daniel Gómez Ferro added a comment - - edited @@ -69,7 +69,7 @@ public class S4RLoaderFactory { File s4rDir = null ; if (tmpDir == null ) { s4rDir = Files.createTempDir(); - s4rDir.deleteOnExit(); + // s4rDir.deleteOnExit(); Did you forget to uncomment this? TestModuleLoaderRemote is failing here. It tries to run testLocal() which is inherited from TestModuleLoader, and this test hangs. As it is, testLocal() runs twice.
        Hide
        Matthieu Morel added a comment - - edited

        Submitted a patch in branch S4-59, commit [d3b7c30] [2ecb531] (that commit includes missing files)

        The idea is to rethink the way we configure nodes and apps.
        Instead of independently configuring S4 nodes, and having to modify their classpath - not in a nice way - in order to add custom modules, I propose to:

        • start S4 nodes with only the bare minimum, i.e. the mechanism to pick partitions and register on the cluster
        • specify everything along with the application:
          • custom modules, and a way to fetch relevant code if necessary
          • application class, and a way to fetch relevant code and dependencies if necessary

        From a user point of view, starting an app on a cluster involves:

        • configuring cluster in zookeeper (number of partitions)
        • starting s4 nodes
        • deploying application, including custom platform config (modules)
          The nodes will automatically fetch everything necessary for running the app with the expected configuration
        Show
        Matthieu Morel added a comment - - edited Submitted a patch in branch S4-59 , commit [d3b7c30] [2ecb531] (that commit includes missing files) The idea is to rethink the way we configure nodes and apps. Instead of independently configuring S4 nodes, and having to modify their classpath - not in a nice way - in order to add custom modules, I propose to: start S4 nodes with only the bare minimum, i.e. the mechanism to pick partitions and register on the cluster specify everything along with the application: custom modules, and a way to fetch relevant code if necessary application class, and a way to fetch relevant code and dependencies if necessary From a user point of view, starting an app on a cluster involves: configuring cluster in zookeeper (number of partitions) starting s4 nodes deploying application, including custom platform config (modules) The nodes will automatically fetch everything necessary for running the app with the expected configuration
        Hide
        Matthieu Morel added a comment -

        rescheduling to 0.6

        Current workaround:

        • for source package: place resource files in subproject/s4-tools/build/install/s4-tools/lib directory before starting "s4 node"
        Show
        Matthieu Morel added a comment - rescheduling to 0.6 Current workaround: for source package: place resource files in subproject/s4-tools/build/install/s4-tools/lib directory before starting "s4 node"
        Hide
        Matthieu Morel added a comment -

        Thanks for the suggestion!
        I think you are referring to the node's classpath. And indeed we need such a facility, especially for logging. We should provide an overridable directory location.

        That will imply modifying the classpath specified in the generated s4-tools script.

        Show
        Matthieu Morel added a comment - Thanks for the suggestion! I think you are referring to the node's classpath. And indeed we need such a facility, especially for logging. We should provide an overridable directory location. That will imply modifying the classpath specified in the generated s4-tools script.

          People

          • Assignee:
            Matthieu Morel
            Reporter:
            Daniel Gómez Ferro
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development