Hadoop Common
  1. Hadoop Common
  2. HADOOP-6438

Add configuration getters/setters to serialization classes

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Needed for MAPREDUCE-1126, getter and setter methods to inject specific metadata into configurations to (de)serialize various data types.

      1. HADOOP-6438.patch
        11 kB
        Aaron Kimball

        Issue Links

          Activity

          Hide
          Aaron Kimball added a comment -

          Patch that provides static getter/setter methods a la FileInputFormat.addInputPath() and friends.

          Will mark as patch-available after HADOOP-6420 is resolved.

          Show
          Aaron Kimball added a comment - Patch that provides static getter/setter methods a la FileInputFormat.addInputPath() and friends. Will mark as patch-available after HADOOP-6420 is resolved.
          Hide
          Doug Cutting added a comment -

          Shouldn't this code be in the mapreduce project? Perhaps we can add a new class in lib/input for these methods? This might then be included in MAPREDUCE-1126.

          Show
          Doug Cutting added a comment - Shouldn't this code be in the mapreduce project? Perhaps we can add a new class in lib/input for these methods? This might then be included in MAPREDUCE-1126 .
          Hide
          Aaron Kimball added a comment -

          In MAPREDUCE-1126 you suggested that we add these methods directly to WritableSerialization and AvroSerialization. We could move them into the MR project proper, though I don't know that o.a.h.m.lib.input makes much sense as a destination – these methods are configuring the intermediate data types, not the input types.

          For future-proofing purposes: I anticipate that for MAPREDUCE-815, we'll also need to add related methods to set the serialization metadata maps for the final output key and value types. (a la JobConf.setOutputKeyClass()). It would be nice if the output type configuration and the intermediate type configuration remained "close together" (e.g., in the same package).

          Should we add a new package that contains a configuration API around serializers? e.g. o.a.h.m.lib.serialization or o.a.h.m.lib.types

          Show
          Aaron Kimball added a comment - In MAPREDUCE-1126 you suggested that we add these methods directly to WritableSerialization and AvroSerialization. We could move them into the MR project proper, though I don't know that o.a.h.m.lib.input makes much sense as a destination – these methods are configuring the intermediate data types, not the input types. For future-proofing purposes: I anticipate that for MAPREDUCE-815 , we'll also need to add related methods to set the serialization metadata maps for the final output key and value types. (a la JobConf.setOutputKeyClass() ). It would be nice if the output type configuration and the intermediate type configuration remained "close together" (e.g., in the same package). Should we add a new package that contains a configuration API around serializers? e.g. o.a.h.m.lib.serialization or o.a.h.m.lib.types
          Hide
          Doug Cutting added a comment -

          > In MAPREDUCE-1126 you suggested that we add these methods directly to WritableSerialization and AvroSerialization.

          Sorry, I must have forgotten to put on my project-split goggles! If they're mapreduce-speciific, then they really belong in that project, I think.

          > Should we add a new package that contains a configuration API around serializers?

          A package feels like overkill to me. Do we expect more than a single class?

          Show
          Doug Cutting added a comment - > In MAPREDUCE-1126 you suggested that we add these methods directly to WritableSerialization and AvroSerialization. Sorry, I must have forgotten to put on my project-split goggles! If they're mapreduce-speciific, then they really belong in that project, I think. > Should we add a new package that contains a configuration API around serializers? A package feels like overkill to me. Do we expect more than a single class?
          Hide
          Aaron Kimball added a comment -

          > A package feels like overkill to me. Do we expect more than a single class?

          Depends? There are currently two ways to set the metadata: In most cases, setting the serialized class name is sufficient. (e.g., JavaSerialization, WritableSerialization, AvroSpecificSerialization). In the AvroGenericSerialization case, though, we set the schema to serialize, using an analogous but distinct API. We could maybe add a single class that contains methods like setMapOutputKeyClass() and setMapOutputKeySchema(); the current implementation breaks these apart, making the serialization system being used more explicit in the class name to which the configuration method belongs. I could go either way on this issue.

          Show
          Aaron Kimball added a comment - > A package feels like overkill to me. Do we expect more than a single class? Depends? There are currently two ways to set the metadata: In most cases, setting the serialized class name is sufficient. (e.g., JavaSerialization, WritableSerialization, AvroSpecificSerialization). In the AvroGenericSerialization case, though, we set the schema to serialize, using an analogous but distinct API. We could maybe add a single class that contains methods like setMapOutputKeyClass() and setMapOutputKeySchema() ; the current implementation breaks these apart, making the serialization system being used more explicit in the class name to which the configuration method belongs. I could go either way on this issue.
          Hide
          Aaron Kimball added a comment -

          After discussion here and on MAPREDUCE-1126, the conclusion is that these getters/setters belong in the MAPREDUCE project; code is folded into MAPREDUCE-1126.

          Show
          Aaron Kimball added a comment - After discussion here and on MAPREDUCE-1126 , the conclusion is that these getters/setters belong in the MAPREDUCE project; code is folded into MAPREDUCE-1126 .

            People

            • Assignee:
              Aaron Kimball
              Reporter:
              Aaron Kimball
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development