Hadoop Common
  1. Hadoop Common
  2. HADOOP-4192

Class <? extends T> Deserializer.getRealClass() method to return the actual class of the objects from a deserializer

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Note: this use case is completely for non-self describing files with Serialization framework records. If the Serialization Class and the actual type of records to be deserialized is configured higher up through the JobConf.

      It is motivated by the need to create a generic FlatFileDeserializerRecordReader that can be configued to use any Serialization implementation through the JobConf.

      Since A deserializer can return a subtype of the type it is instantiated to return, we can create generic Deserializers for a base type - e.g., Writable, Record, Thrift.Tbase where the RecordReader need not be specific to any of them.

      In which case,to implement RecordReader.getValueClass();, the generic RecordReader really needs to query that from the Deserializer.

      And since this RecordReader is generic even ithe Serialization Implementation it is going to use should come from the JobConf as should the actual specific class being Deserialized. e.g., Record/MyUserIDRecord, Writable/LongWritable.

      The RecordReader would need to know how the Serialization and Deserializer get their configuration info to implement getValueClass().

      A much cleaner way is to implement getRealClass I think.

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Pete Wyckoff
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development