Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1357

Allow to force reading generic records for input data and map output data

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.7.4
    • None
    • java
    • None
    • Hide
      Allow users to choose de-serialization types among generic (always generating generic records), specific (generating specific records if input data matches an Avro generated class), and reflect (generating instances if input data matches an existing class) for input data and map output data in mapred jobs.
      Show
      Allow users to choose de-serialization types among generic (always generating generic records), specific (generating specific records if input data matches an Avro generated class), and reflect (generating instances if input data matches an existing class) for input data and map output data in mapred jobs.

    Description

      In AvroJob/AvroInputFormat/AvroRecordReader, we can choose either SpecificDatumReader or ReflectDatumReader to read input data and map output data, but not GenericDatumReader. We may want to force reading generic records for some jobs.

      For example, assume that the input records contain a field called "category" and we want to compute the number of records for each category. If we can force reading generic records, we can get the category string by calling get("category"). Otherwise, the input record might be loaded as a GenericRecord instance or a SpecificRecord instance. The latter does not implement GenericRecord.

      To add this feature, we can change the booleans IS_REFLECT/MAP_OUTPUT_IS_REFLECT into enums called INPUT_AVRO_DESERIALIZATION_TYPE/MAP_OUTPUT_AVRO_DESERIALIZATION_TYPE, and return the corresponding DatumReader based on the type.

      We can add setDeserializationType/setInputDeserializationType/setMapOutputDeserializationType to AvroJob while deprecating setReflect/setInputReflect/setMapOutputReflect.

      Attachments

        1. AVRO-1357.patch.1
          9 kB
          Xiangrui Meng
        2. AVRO-1357.patch
          9 kB
          Xiangrui Meng

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mengxr Xiangrui Meng
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 48h
                  48h
                  Remaining:
                  Remaining Estimate - 48h
                  48h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified