Hadoop Common
  1. Hadoop Common
  2. HADOOP-8549

Allow other implementation's of java.util.Map in MapWritable

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: None
    • Component/s: io
    • Labels:
      None

      Description

      Current implementation of MapWritable uses HashMap as Map implementation. But in some of the use cases we need other implementations of Map like LinkedHashMap,SortedMap.This jira changes visibility of 'instance' in MapWritable from private to protected which allows us to inject custom Map implementation through sub classing MapWritable .

      1. HADOOP-8549.patch
        2 kB
        madhukara phatak

        Activity

        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12534668/HADOOP-8549.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

        org.apache.hadoop.ha.TestZKFailoverController
        org.apache.hadoop.io.file.tfile.TestTFileByteArrays
        org.apache.hadoop.io.file.tfile.TestTFileJClassComparatorByteArrays

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1164//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1164//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12534668/HADOOP-8549.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverController org.apache.hadoop.io.file.tfile.TestTFileByteArrays org.apache.hadoop.io.file.tfile.TestTFileJClassComparatorByteArrays +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1164//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1164//console This message is automatically generated.
        Hide
        Harsh J added a comment -

        Patch lacks a few items:

        1. No javadocs or any other form of dev-docs seem to be present. How would others know how to use this feature? Lets document it.
        2. I'm -1 on relying on protected member variables directly, as thats ugly to do. Do something similar to ArrayWritable instead, where an inheritor passes the class via a constructor. Or at least provide getter/setter of some form.

        The idea is good, but how does it work in terms of portability? If it is written by client who used X impl. of map, and then read by another client who used Y or the default impl., is it guaranteed to work? If yes, lets document that guarantee. And if not, does it throw a proper error message?

        Thanks!

        Show
        Harsh J added a comment - Patch lacks a few items: No javadocs or any other form of dev-docs seem to be present. How would others know how to use this feature? Lets document it. I'm -1 on relying on protected member variables directly, as thats ugly to do. Do something similar to ArrayWritable instead, where an inheritor passes the class via a constructor. Or at least provide getter/setter of some form. The idea is good, but how does it work in terms of portability? If it is written by client who used X impl. of map, and then read by another client who used Y or the default impl., is it guaranteed to work? If yes, lets document that guarantee. And if not, does it throw a proper error message? Thanks!
        Hide
        madhukara phatak added a comment -

        Hi Harsha,
        Thanks for looking at patch. Please find my reply for the concerns raised by you.

        1. No javadocs or any other form of dev-docs seem to be present. How would others know how to use this feature? Lets document it.

        Will add the appropriate javadoc for it once we finalize on implementation

        2. I'm -1 on relying on protected member variables directly, as thats ugly to do. Do something similar to ArrayWritable instead, where an inheritor passes the class via a constructor. Or at least provide getter/setter of some form.

        I tried it to make as ArrayWritable but it breaks the existing MapWritable code since implementation class information has to be written into stream. Even i tried setters/getters but it will not work since in M/R code Writables are created through default constructor. So protected member was only way i can think of solving the problem.

        The idea is good, but how does it work in terms of portability? If it is written by client who used X impl. of map, and then read by another client who used Y or the default impl., is it guaranteed to work? If yes, lets document that guarantee. And if not, does it throw a proper error message?

        Since client has to use different class to get the new behavior (ex LinkedHashMapWritable) and Writables are not polymorphic, portability should not be an issue.

        Regards,
        Madhukara Phatak

        Show
        madhukara phatak added a comment - Hi Harsha, Thanks for looking at patch. Please find my reply for the concerns raised by you. 1. No javadocs or any other form of dev-docs seem to be present. How would others know how to use this feature? Lets document it. Will add the appropriate javadoc for it once we finalize on implementation 2. I'm -1 on relying on protected member variables directly, as thats ugly to do. Do something similar to ArrayWritable instead, where an inheritor passes the class via a constructor. Or at least provide getter/setter of some form. I tried it to make as ArrayWritable but it breaks the existing MapWritable code since implementation class information has to be written into stream. Even i tried setters/getters but it will not work since in M/R code Writables are created through default constructor. So protected member was only way i can think of solving the problem. The idea is good, but how does it work in terms of portability? If it is written by client who used X impl. of map, and then read by another client who used Y or the default impl., is it guaranteed to work? If yes, lets document that guarantee. And if not, does it throw a proper error message? Since client has to use different class to get the new behavior (ex LinkedHashMapWritable) and Writables are not polymorphic, portability should not be an issue. Regards, Madhukara Phatak
        Hide
        Harsh J added a comment -

        (Canceling until discussion's complete on this)

        Show
        Harsh J added a comment - (Canceling until discussion's complete on this)
        Hide
        Harsh J added a comment -

        Hi and thanks for the reply!

        I tried it to make as ArrayWritable but it breaks the existing MapWritable code since implementation class information has to be written into stream. Even i tried setters/getters but it will not work since in M/R code Writables are created through default constructor. So protected member was only way i can think of solving the problem.

        /** {@inheritDoc} */
          @Override
          public void write(DataOutput out) throws IOException {
            super.write(out);
            
            // Write out the number of entries in the map
            
            out.writeInt(instance.size());
        
            // Then write out each key/value pair
            
            for (Map.Entry<Writable, Writable> e: instance.entrySet()) {
              out.writeByte(getId(e.getKey().getClass()));
              e.getKey().write(out);
              out.writeByte(getId(e.getValue().getClass()));
              e.getValue().write(out);
            }
          }
        

        In the above writer method, I do not see the instance's type being serialized. The key and value types are indeed stored, but not of the instance's.

        Why can we hence not have a MapWritable(Class <? extends Map<Writable, Writable>> mapType) that then instantiates the class for the field member "instance", rather than a dev hacking like that?

        Since client has to use different class to get the new behavior (ex LinkedHashMapWritable) and Writables are not polymorphic, portability should not be an issue.

        Lets still have a test case. Your current test case merely checks the ordering of a sequential implementation without serializing it. Lets instead serialize it, deserialize it again with your custom writable, check the ordering. Also deserialize it with a regular instance, and check the content. Let me know if this makes sense. This will help catch regressions in future and good tests are always good to have.

        Thanks and let me know if am wrong somewhere or if you have any other questions!

        Show
        Harsh J added a comment - Hi and thanks for the reply! I tried it to make as ArrayWritable but it breaks the existing MapWritable code since implementation class information has to be written into stream. Even i tried setters/getters but it will not work since in M/R code Writables are created through default constructor. So protected member was only way i can think of solving the problem. /** {@inheritDoc} */ @Override public void write(DataOutput out) throws IOException { super .write(out); // Write out the number of entries in the map out.writeInt(instance.size()); // Then write out each key/value pair for (Map.Entry<Writable, Writable> e: instance.entrySet()) { out.writeByte(getId(e.getKey().getClass())); e.getKey().write(out); out.writeByte(getId(e.getValue().getClass())); e.getValue().write(out); } } In the above writer method, I do not see the instance's type being serialized. The key and value types are indeed stored, but not of the instance's. Why can we hence not have a MapWritable(Class <? extends Map<Writable, Writable>> mapType) that then instantiates the class for the field member "instance", rather than a dev hacking like that? Since client has to use different class to get the new behavior (ex LinkedHashMapWritable) and Writables are not polymorphic, portability should not be an issue. Lets still have a test case. Your current test case merely checks the ordering of a sequential implementation without serializing it. Lets instead serialize it, deserialize it again with your custom writable, check the ordering. Also deserialize it with a regular instance, and check the content. Let me know if this makes sense. This will help catch regressions in future and good tests are always good to have. Thanks and let me know if am wrong somewhere or if you have any other questions!
        Hide
        madhukara phatak added a comment -

        Why can we hence not have a MapWritable(Class <? extends Map<Writable, Writable>> mapType) that then instantiates the class for the field member "instance", rather than a dev hacking like that?

        AFAIK in M/R writable's are instantiated through default constructor, which means the above constructor will not be called. So if I want to have a custom Map instance implementation it has to be in read/write methods. Please correct me if I am wrong

        Show
        madhukara phatak added a comment - Why can we hence not have a MapWritable(Class <? extends Map<Writable, Writable>> mapType) that then instantiates the class for the field member "instance", rather than a dev hacking like that? AFAIK in M/R writable's are instantiated through default constructor, which means the above constructor will not be called. So if I want to have a custom Map instance implementation it has to be in read/write methods. Please correct me if I am wrong

          People

          • Assignee:
            madhukara phatak
            Reporter:
            madhukara phatak
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development