Hadoop Common
  1. Hadoop Common
  2. HADOOP-4416

class names in Configuration are not resolved if the configuration value has a white space

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      all

      Description

      If an entry in the configuration used for a class contains spaces or enters before or after the getClass and getClasses methods fail to resolve the class.

      For example:

          <property>
              <name>mapred.mapper.class</name>
              <value>
                com.foo.MyMapper
              </value>
          </property>
      
      1. HADOOP-4416.patch
        2 kB
        Abhijit Bagri
      2. HADOOP-4416-v2.patch
        5 kB
        Abhijit Bagri
      3. HADOOP-4416-v3.patch
        6 kB
        Abhijit Bagri
      4. HADOOP-4416-v4.patch
        6 kB
        Abhijit Bagri

        Issue Links

          Activity

          Hide
          Alejandro Abdelnur added a comment -

          The Configuration.getClassByName() method should do a name.trim()

          Show
          Alejandro Abdelnur added a comment - The Configuration.getClassByName() method should do a name.trim()
          Hide
          steve_l added a comment -

          This is an instance of HADOOP-4212; a .trim() does seem appropriate here, assuming that HADOOP-4212 is going to remain open or wontfix.

          Show
          steve_l added a comment - This is an instance of HADOOP-4212 ; a .trim() does seem appropriate here, assuming that HADOOP-4212 is going to remain open or wontfix.
          Hide
          Alejandro Abdelnur added a comment -

          This is not really an instance of HADOOP-4212. Personally I would agree that for configuration values you can safely trim them, but I can see people opposing to it.

          For class names it is different, you don't want the spaces.

          And it should be done in the getClassByName() method to properly handle the getClasses() method, this is important for values like:

          <property>
            <name>io.compression.codecs</name>
            <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
          </property>
          

          Where if you have several classes you may want to have an enter between commas for readability.

          Show
          Alejandro Abdelnur added a comment - This is not really an instance of HADOOP-4212 . Personally I would agree that for configuration values you can safely trim them, but I can see people opposing to it. For class names it is different, you don't want the spaces. And it should be done in the getClassByName() method to properly handle the getClasses() method, this is important for values like: <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec</value> </property> Where if you have several classes you may want to have an enter between commas for readability.
          Hide
          Nigel Daley added a comment -

          If this isn't a regression, then this isn't for 0.19.0.

          Show
          Nigel Daley added a comment - If this isn't a regression, then this isn't for 0.19.0.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 for trimming in getClassByName() but this is not for 0.19.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 for trimming in getClassByName() but this is not for 0.19.
          Hide
          Enis Soztutar added a comment -

          Moving from 0.19 due to feature freeze.

          Show
          Enis Soztutar added a comment - Moving from 0.19 due to feature freeze.
          Hide
          Abhijit Bagri added a comment -

          Patch for this. Also added a test case for getClass() and getClasses()

          Show
          Abhijit Bagri added a comment - Patch for this. Also added a test case for getClass() and getClasses()
          Hide
          Sreekanth Ramakrishnan added a comment -

          Sorry, for commenting this late. I think this issue is not just related to class names. It is present with all values which are being read. I think we should deal with this generically instead of fixing part by part. I had filed a JIRA related to this issue, you can also take a look at the discussion which has happened at that place. HADOOP-4212

          Show
          Sreekanth Ramakrishnan added a comment - Sorry, for commenting this late. I think this issue is not just related to class names. It is present with all values which are being read. I think we should deal with this generically instead of fixing part by part. I had filed a JIRA related to this issue, you can also take a look at the discussion which has happened at that place. HADOOP-4212
          Hide
          Abhijit Bagri added a comment -

          If there is a consensus on HADOOP-4212, IMO that should be the way to do it. I am not sure the discussion there are going in that direction though.

          Show
          Abhijit Bagri added a comment - If there is a consensus on HADOOP-4212 , IMO that should be the way to do it. I am not sure the discussion there are going in that direction though.
          Hide
          Abhijit Bagri added a comment - - edited

          As per discussion on HADOOP-4212, I think a compatible changes list should apply for the following functions. Individual function trims will be needed

          getInt()
          getLong()
          getFloat()
          getBoolean()
          getRange()
          getClasses()
          getClass()

          I am not sure about getString() and getStringCollection(), Comments?

          Show
          Abhijit Bagri added a comment - - edited As per discussion on HADOOP-4212 , I think a compatible changes list should apply for the following functions. Individual function trims will be needed getInt() getLong() getFloat() getBoolean() getRange() getClasses() getClass() I am not sure about getString() and getStringCollection(), Comments?
          Hide
          Abhijit Bagri added a comment -

          Here is patch which takes care of whitespace for following functions:

          getInt
          getLong
          getFloat
          getBoolean - also made the comparison case sensitive here.
          getClass
          getClasses

          I realized that IntegerRange already does a trim for each value in the comma separated list. Modified a test case to account for that. Added test cases for others too.

          Comments?

          Show
          Abhijit Bagri added a comment - Here is patch which takes care of whitespace for following functions: getInt getLong getFloat getBoolean - also made the comparison case sensitive here. getClass getClasses I realized that IntegerRange already does a trim for each value in the comma separated list. Modified a test case to account for that. Added test cases for others too. Comments?
          Hide
          steve_l added a comment -

          Looking at the patch,

          1. You need tests of the default handling. The get operations are all calling .trim() before checking for the result being null; as a result they will probably NPE rather than return defaults

          e.g, what happens on conf.getLong("undefined-key",23);

          public long getLong(String name, long defaultValue) {

          • String valueString = get(name);
            + String valueString = get(name).trim();
            if (valueString == null)
            return defaultValue;

          2. I'd recommend the trim operation is made a (protected?) method in Configuration

          protected String trim(String source)

          { return source==null?source:source.trim(); }

          One place to keep the logic, one test for null handling might suffice

          3. the .toLowerCase() operation for getBoolean should be made locale neutral, by adding a locale parameter:
          toLowerCase(Locale.EN_US)

          4. Also, getBoolean shouldn't do any toLowerCase() operations on null values.

          5. The changes to getBoolean() change the semantics of the operation. Currently

          bool b=conf.getBoolean("something",false"); returns false if "something" is set to "TRUE" in the file. With the case change, it will evaluate to true. I don't know if this matters, but it isn't completely backwards compatible.

          Show
          steve_l added a comment - Looking at the patch, 1. You need tests of the default handling. The get operations are all calling .trim() before checking for the result being null; as a result they will probably NPE rather than return defaults e.g, what happens on conf.getLong("undefined-key",23); public long getLong(String name, long defaultValue) { String valueString = get(name); + String valueString = get(name).trim(); if (valueString == null) return defaultValue; 2. I'd recommend the trim operation is made a (protected?) method in Configuration protected String trim(String source) { return source==null?source:source.trim(); } One place to keep the logic, one test for null handling might suffice 3. the .toLowerCase() operation for getBoolean should be made locale neutral, by adding a locale parameter: toLowerCase(Locale.EN_US) 4. Also, getBoolean shouldn't do any toLowerCase() operations on null values. 5. The changes to getBoolean() change the semantics of the operation. Currently bool b=conf.getBoolean("something",false"); returns false if "something" is set to "TRUE" in the file. With the case change, it will evaluate to true. I don't know if this matters, but it isn't completely backwards compatible.
          Hide
          Abhijit Bagri added a comment -

          Thanks, Steve for reviewing the patch. I updated the patch with your comments.

          Making trim() protected makes sense.

          Strictly speaking, doing toLowerCase does break semantics. I would say that if you have a conf.getBoolean("something-TRUE", false), it is more likely a bug. I am leaving this out of the scope of this jira. I will file a separate issue on that.

          Added test cases for default handling.

          Show
          Abhijit Bagri added a comment - Thanks, Steve for reviewing the patch. I updated the patch with your comments. Making trim() protected makes sense. Strictly speaking, doing toLowerCase does break semantics. I would say that if you have a conf.getBoolean("something-TRUE", false), it is more likely a bug. I am leaving this out of the scope of this jira. I will file a separate issue on that. Added test cases for default handling.
          Hide
          steve_l added a comment -

          Patch looks good. Reviewing the trim() method, I think it could be made simpler by explicitly returning null for a null argument

          protected String trim(String source) { 
           return source==null ? null : source.trim(); 
          }
          
          Show
          steve_l added a comment - Patch looks good. Reviewing the trim() method, I think it could be made simpler by explicitly returning null for a null argument protected String trim( String source) { return source== null ? null : source.trim(); }
          Hide
          Alejandro Abdelnur added a comment -

          Looks good, adding to Steve's changes:

          private String trim(String source) { 
           return (source==null) ? null : source.trim(); 
          }
          

          Also, remove the

          {import java.util.Locale}

          for

          {Configuration}

          .

          Show
          Alejandro Abdelnur added a comment - Looks good, adding to Steve's changes: private String trim( String source) { return (source== null ) ? null : source.trim(); } Also, remove the {import java.util.Locale} for {Configuration} .
          Hide
          Abhijit Bagri added a comment -

          Thanks for the reviews. Incorporated all the suggestions.

          Show
          Abhijit Bagri added a comment - Thanks for the reviews. Incorporated all the suggestions.
          Hide
          Harsh J added a comment -

          We trim in these functions today. This was fixed in more recent jiras. This one is now not needed.

          Show
          Harsh J added a comment - We trim in these functions today. This was fixed in more recent jiras. This one is now not needed.

            People

            • Assignee:
              Abhijit Bagri
              Reporter:
              Alejandro Abdelnur
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development