Uploaded image for project: 'REEF'
  1. REEF
  2. REEF-751

Add a standard way to provide cluster-local configuration

    Details

      Description

      When running a REEF application on a specific cluster, we sometimes have to make cluster-specific configuration choices beyond what's available via e.g. the YARN Configuration. One example are clusters which only allow us to open TCP ports in a specific range.

      Right now, we require the application to add such configuration to the Driver, e.g. via the DriverConfigurationProviders mechanism. This is awkward, because it limits the portability of applications: In order for an app to run such a cluster, one would have to change the app's code. This is undesirable.

      Instead, we should have a standard mechanism by which one can provide additional runtime-level configuration to be picked up by the REEF client. Let's use this JIRA to discuss potential solutions to this issue.

        Activity

        Hide
        markus.weimer Markus Weimer added a comment -

        Thanks for your interest, Madhawa Vidanapathirana. There hasn't been progress on this issue beyond the discussion above.

        Show
        markus.weimer Markus Weimer added a comment - Thanks for your interest, Madhawa Vidanapathirana . There hasn't been progress on this issue beyond the discussion above.
        Hide
        madhawavi Madhawa Vidanapathirana added a comment -

        Hi,
        I am a 4th year student of Department of Computer Science and Engineering, University of Moratuwa. Additionally, I am familiar with both JAVA and C Sharp programming languages.
        I would like to contribute to this project on GSOC 2017.

        Can I know more details about progress made so far on above discussed ideas?
        How can I get started with work?

        Show
        madhawavi Madhawa Vidanapathirana added a comment - Hi, I am a 4th year student of Department of Computer Science and Engineering, University of Moratuwa. Additionally, I am familiar with both JAVA and C Sharp programming languages. I would like to contribute to this project on GSOC 2017. Can I know more details about progress made so far on above discussed ideas? How can I get started with work?
        Hide
        markus.weimer Markus Weimer added a comment -

        Here are some ideas on how to support this:

        We will need two environment variables, REEF_RUNTIME_CONFIGURATION_JAVA and REEF_RUNTIME_CONFIGURATION_NET. Each of those points to a list of configuration files we will merge into the runtime configuration in the client. I'll use the C# class names below, but the same applies to the Java side.

        We need to get those configurations merged into the configuration used to instantiate IREEFClient. This is tricky, as all our canonical examples right now have the application code instantiate the configuration, followed by using an Injector which is used to instantiate the application's client code. That code in turn depends on an instance of IREEFClient. This pattern uses Tang all the way to the Main function. Which is nice, and we should support it, but not always desirable.

        Now, to get these new Configurations merged in, we need to intercept the creation of the IREEFClient. We could have a class with static methods for that purpose:

        public class REEF{
          // Merges the configuration given with the ones the env variables point to.
          // If conf is null, we assume the env variables point to the complete configuration for this cluster.
          public static IConfiguration GetRuntimeConfiguration(IConfiguration runtimeConfiguration = null);
        
          // Creates the injector using GetRuntimeConfiguration(runtimeConfiguration) 
          public static Injector NewRuntimeInjector(IConfiguration runtimeConfiguration = null);
        
          // Creates an IREEFClient instance using the injector created with NewRuntimeInjector
          public static IREEEFClient NewREEFClient(IConfiguration runtimeConfiguration = null);
        }
        

        This class can be used as a drop-in for the call to NewInjector in the current HelloREEF. Also, it allows clients which don't want to use Tang all the way to just call NewREEFClient and be done with it.

        Show
        markus.weimer Markus Weimer added a comment - Here are some ideas on how to support this: We will need two environment variables, REEF_RUNTIME_CONFIGURATION_JAVA and REEF_RUNTIME_CONFIGURATION_NET . Each of those points to a list of configuration files we will merge into the runtime configuration in the client. I'll use the C# class names below, but the same applies to the Java side. We need to get those configurations merged into the configuration used to instantiate IREEFClient . This is tricky, as all our canonical examples right now have the application code instantiate the configuration, followed by using an Injector which is used to instantiate the application's client code. That code in turn depends on an instance of IREEFClient . This pattern uses Tang all the way to the Main function. Which is nice, and we should support it, but not always desirable. Now, to get these new Configurations merged in, we need to intercept the creation of the IREEFClient . We could have a class with static methods for that purpose: public class REEF{ // Merges the configuration given with the ones the env variables point to. // If conf is null , we assume the env variables point to the complete configuration for this cluster. public static IConfiguration GetRuntimeConfiguration(IConfiguration runtimeConfiguration = null ); // Creates the injector using GetRuntimeConfiguration(runtimeConfiguration) public static Injector NewRuntimeInjector(IConfiguration runtimeConfiguration = null ); // Creates an IREEFClient instance using the injector created with NewRuntimeInjector public static IREEEFClient NewREEFClient(IConfiguration runtimeConfiguration = null ); } This class can be used as a drop-in for the call to NewInjector in the current HelloREEF . Also, it allows clients which don't want to use Tang all the way to just call NewREEFClient and be done with it.
        Hide
        markus.weimer Markus Weimer added a comment -

        Yes, exactly. For now, it would be a Configuration in JSON format. Hadoop uses XML. We could improve upon this by using a more humane format like YAML.

        Show
        markus.weimer Markus Weimer added a comment - Yes, exactly. For now, it would be a Configuration in JSON format. Hadoop uses XML. We could improve upon this by using a more humane format like YAML.
        Hide
        dkm2110 Dhruv Mahajan added a comment -

        I see ok. But still as user front end we should not expect him to write tang like configuration module but some sort of key-value file just like in Hadoop. What do you think?

        Show
        dkm2110 Dhruv Mahajan added a comment - I see ok. But still as user front end we should not expect him to write tang like configuration module but some sort of key-value file just like in Hadoop. What do you think?
        Hide
        markus.weimer Markus Weimer added a comment -

        So the assumption is that all these configs would be available automatically at all evaluators including driver, right?

        I don't think we can make that happen, as it would require the config to be on all the nodes of the cluster. I was thinking of only using this to point to configurations to be merged into the runtime configuration used by the client.

        Show
        markus.weimer Markus Weimer added a comment - So the assumption is that all these configs would be available automatically at all evaluators including driver, right? I don't think we can make that happen, as it would require the config to be on all the nodes of the cluster. I was thinking of only using this to point to configurations to be merged into the runtime configuration used by the client.
        Hide
        dkm2110 Dhruv Mahajan added a comment -

        So the assumption is that all these configs would be available automatically at all evaluators including driver, right? And by environment variable you mean it points to a file where user can specify fields like portoffset, port range etc.just like the way it is done in hadoop map reduce jobs where we can specify things like speculative execution, input format (n line etc.). To me hadoop way is good and if you mean the same thing I feel its a good solution

        Show
        dkm2110 Dhruv Mahajan added a comment - So the assumption is that all these configs would be available automatically at all evaluators including driver, right? And by environment variable you mean it points to a file where user can specify fields like portoffset, port range etc.just like the way it is done in hadoop map reduce jobs where we can specify things like speculative execution, input format (n line etc.). To me hadoop way is good and if you mean the same thing I feel its a good solution
        Hide
        markus.weimer Markus Weimer added a comment -

        We could have an environment variable REEF_RUNTIME_CONFIGURATION which points to such configurations.

        Show
        markus.weimer Markus Weimer added a comment - We could have an environment variable REEF_RUNTIME_CONFIGURATION which points to such configurations.

          People

          • Assignee:
            Unassigned
            Reporter:
            markus.weimer Markus Weimer
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development