Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-5735

Record-oriented processors/services do not properly support Avro Unions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 1.7.1
    • None
    • Core Framework, Extensions
    • Important

    Description

      The Avro spec states:

      Unions may not contain more than one schema with the same type, except for the named types record, fixed and enum. For example, unions containing two array types or two map types are not permitted, but two types with different names are permitted. (Names permit efficient resolution when reading and writing unions.)

      However record oriented processors/services in Nifi do not support multiple named types per union. This is a problem, for example, with the following schema:

      {
          "type": "record",
          "name": "root",
          "fields": [
              {
                  "name": "children",
                  "type": {
                      "type": "array",
                      "items": [
                          {
                              "type": "record",
                              "name": "left",
                              "fields": [
                                  {
                                      "name": "f1",
                                      "type": "string"
                                  }
                              ]
                          },
                          {
                              "type": "record",
                              "name": "right",
                              "fields": [
                                  {
                                      "name": "f2",
                                      "type": "int"
                                  }
                              ]
                          }
                      ]
                  }
              }
          ]
      }
      

       This schema contains a field name "children" which is array of type union. The union type contains two possible record types. Currently the Nifi avro utilities will fail to process records of this schema with "children" arrays that contain both "left" and "right" record types.

      I've traced this bug to the AvroTypeUtils class.

      Specifically there are bugs in the convertUnionFieldValue method and in the buildAvroSchema method. Both of these methods make the assumption that an Avro union can only contain one child type of each type. As stated in the spec, this is true for primitive types and non-named complex types but not for named types.

       There may be related bugs elsewhere, but I haven't been able to locate them yet.

       

       

      Attachments

        1. 0001-NIFI-5735-added-preliminary-support-for-union-resolu.patch
          19 kB
          Daniel Solow
        2. NIFI-5735.patch
          12 kB
          Alex Savitsky

        Activity

          People

            Unassigned Unassigned
            dsolow Daniel Solow
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: