Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: HADOOP-10388
    • Component/s: None
    • Labels:
      None

      Description

      We need to have a way to read Hadoop configuration XML files in the native HDFS and YARN clients. This will allow those clients to discover the locations of NameNodes, YARN daemons, and other configuration settings, etc. etc.

        Issue Links

          Activity

          Hide
          Colin Patrick McCabe added a comment -

          libexpat might be a reasonable choice for a C XML parsing library to use. It's widely known and has an MIT-style license. Looks like libxml2 also has an MIT license. I'm not sure which choice is better.

          Show
          Colin Patrick McCabe added a comment - libexpat might be a reasonable choice for a C XML parsing library to use. It's widely known and has an MIT-style license. Looks like libxml2 also has an MIT license. I'm not sure which choice is better.
          Hide
          Steve Loughran added a comment -

          parsing and handling all the various transitive resolution operations are probably a troublespot. A good start would actually be a set of XML files & lists of things to resolve -> outcomes; something declarative that could be used to regression test the java code as well as native, .net, .py, etc.

          And before that, someone should define the format and resolution behaviour more strictly. I am not currently volunteering -sorry

          Show
          Steve Loughran added a comment - parsing and handling all the various transitive resolution operations are probably a troublespot. A good start would actually be a set of XML files & lists of things to resolve -> outcomes; something declarative that could be used to regression test the java code as well as native, .net, .py, etc. And before that, someone should define the format and resolution behaviour more strictly. I am not currently volunteering -sorry
          Hide
          Colin Patrick McCabe added a comment -

          Yes, it would be nice to have a few sample XML files and a unit test that verified that we did the overrides in the same way as Hadoop. Then later maybe we could start using those same sample XML files to unit test the Java code, etc.

          The overrides are fussy, but not conceptually difficult. You just have to read the files in the correct order, I think. You also have to search CLASSPATH for the XML files.

          Show
          Colin Patrick McCabe added a comment - Yes, it would be nice to have a few sample XML files and a unit test that verified that we did the overrides in the same way as Hadoop. Then later maybe we could start using those same sample XML files to unit test the Java code, etc. The overrides are fussy, but not conceptually difficult. You just have to read the files in the correct order, I think. You also have to search CLASSPATH for the XML files.
          Hide
          Steve Loughran added a comment -

          there's the special case that a new config instance can register a new central resource which triggers a reload of all existing configurations.

          Show
          Steve Loughran added a comment - there's the special case that a new config instance can register a new central resource which triggers a reload of all existing configurations.
          Hide
          Colin Patrick McCabe added a comment -

          there's the special case that a new config instance can register a new central resource which triggers a reload of all existing configurations.

          I don't think we need to support that. The configuration stuff is just going to be used internally by the HDFS and YARN native clients, so if we can just get the behavior of those clients we'll be ok.

          Show
          Colin Patrick McCabe added a comment - there's the special case that a new config instance can register a new central resource which triggers a reload of all existing configurations. I don't think we need to support that. The configuration stuff is just going to be used internally by the HDFS and YARN native clients, so if we can just get the behavior of those clients we'll be ok.
          Hide
          Haohui Mai added a comment -

          Is it possible to generate some sorts of intermediate file from the configuration using a Java utility program? The utility program can reuse the current code to handle things like classpath, final declarations, etc.

          The intermediate file contains all the final configuration value. That way this jira becomes easy to solve.

          Show
          Haohui Mai added a comment - Is it possible to generate some sorts of intermediate file from the configuration using a Java utility program? The utility program can reuse the current code to handle things like classpath, final declarations, etc. The intermediate file contains all the final configuration value. That way this jira becomes easy to solve.
          Hide
          Colin Patrick McCabe added a comment -

          While that's clever, I don't think that would really be an option for most people. Most management software just creates XML files for Hadoop. It would be a lot of work to introduce an extra step where configurations needed to be compiled to something else.

          Reading the configuration looks daunting at first, but you have to remember that we really only need to support the features the client needs. In fact, I think things would be simpler if we just had a ConfigurationBuilder type thing that created immutable Configuration objects. Then a lot of the really complicated concurrency issues with the Configuration class in Java just go away in the C version.

          Show
          Colin Patrick McCabe added a comment - While that's clever, I don't think that would really be an option for most people. Most management software just creates XML files for Hadoop. It would be a lot of work to introduce an extra step where configurations needed to be compiled to something else. Reading the configuration looks daunting at first, but you have to remember that we really only need to support the features the client needs. In fact, I think things would be simpler if we just had a ConfigurationBuilder type thing that created immutable Configuration objects. Then a lot of the really complicated concurrency issues with the Configuration class in Java just go away in the C version.
          Hide
          Steve Loughran added a comment -

          If there's something you MUST address it is references in the objects {{$

          {something}

          }} -used a lot and not something you can ignore

          1. I don't think its strictly defined yet
          2. from the code it is: attempt local, then the base resources
          3. and finally java system properties
          4. I don't know about transitive reference and if, when and how recursion & loops are detected.

          Equally interesting is "when does this take place?", the answer being "when someone calls getXXX()". You have to leave the values unresolved when you serialize a config & send it over the wire to the far end; only your bit of the config is forwarded, the far end resolves values using its base resources and system properties.

          Show
          Steve Loughran added a comment - If there's something you MUST address it is references in the objects {{$ {something} }} -used a lot and not something you can ignore I don't think its strictly defined yet from the code it is: attempt local, then the base resources and finally java system properties I don't know about transitive reference and if, when and how recursion & loops are detected. Equally interesting is "when does this take place?", the answer being "when someone calls getXXX()". You have to leave the values unresolved when you serialize a config & send it over the wire to the far end; only your bit of the config is forwarded, the far end resolves values using its base resources and system properties.
          Hide
          Colin Patrick McCabe added a comment -

          Object references are a good consideration. I remember using that a lot in HA namenode configurations for HDFS.

          I don't think serializing a config and sending it over the wire is something we really need to support in the native client. The HDFS client doesn't do that, so we can implement that feature if and when we need it. Similarly, Java system properties are something I know we don't have to support in C

          Show
          Colin Patrick McCabe added a comment - Object references are a good consideration. I remember using that a lot in HA namenode configurations for HDFS. I don't think serializing a config and sending it over the wire is something we really need to support in the native client. The HDFS client doesn't do that, so we can implement that feature if and when we need it. Similarly, Java system properties are something I know we don't have to support in C
          Hide
          Steve Loughran added a comment -

          it gets used in job submissions -or at least did in MRv1

          Show
          Steve Loughran added a comment - it gets used in job submissions -or at least did in MRv1
          Hide
          Colin Patrick McCabe added a comment -

          That's a good point. The YARN native client may need the ability to write out the config at some point. It's still not clear to me whether we need that for YARN (as opposed to MR1)-- perhaps a YARN expert could comment. In either case, I don't think we need to do it right away in this JIRA. We can have another JIRA for writing the config back out.

          Show
          Colin Patrick McCabe added a comment - That's a good point. The YARN native client may need the ability to write out the config at some point. It's still not clear to me whether we need that for YARN (as opposed to MR1)-- perhaps a YARN expert could comment. In either case, I don't think we need to do it right away in this JIRA. We can have another JIRA for writing the config back out.
          Hide
          Colin Patrick McCabe added a comment -

          Closing this for now, as we have config XML reading implemented for the HDFS native client. If we need more functionality (like serializing a config and sending it over the wire), we can do it in a separate JIRA. Thanks, all.

          Show
          Colin Patrick McCabe added a comment - Closing this for now, as we have config XML reading implemented for the HDFS native client. If we need more functionality (like serializing a config and sending it over the wire), we can do it in a separate JIRA. Thanks, all.

            People

            • Assignee:
              Colin Patrick McCabe
              Reporter:
              Colin Patrick McCabe
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development