Avro
  1. Avro
  2. AVRO-941

Avro should support the Apache Maven Shade plugin class relocation feature

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.4
    • Fix Version/s: 1.6.0
    • Component/s: java
    • Labels:
      None

      Description

      The Apache shade plugin allows maven builds to create an uber jar that contains dependencies in the project. In addition, the shade plugin allows you to relocate dependencies into a private namespace to prevent class conflicts on shared class paths. Avro does not support relocation.

      All generated Avro objects contain a string field named SCHEMA$ which serves as the authority for the class namespace. When the shade plugin updates the byte code to relocate the class, it doesn't alter the SCHEMA$ string. This break Avro use of reflection since the namespace in SCHEMA$ points to an incorrect location.

      I spoke with Doug about the issue and he was kind enough to provide a quick hack in order to fix this issue. The hack is to check for mismatches between the byte code and the SCHEMA$ and, when they don't match, to defer to the byte code. I'll attach Doug's patch to this Jira.

      1. shade.patch
        2 kB
        Matt Massie

        Activity

        Hide
        Matt Massie added a comment -

        Doug's patch against 1.5.4.

        Show
        Matt Massie added a comment - Doug's patch against 1.5.4.
        Hide
        Philip Zeyliger added a comment -

        Instead of doing string substitution, the patch should probably only substitute the namespace field specifically. If someone had the classname in a comment, it shouldn't be replaced, for example.

        Otherwise, I'm a big fan of being able to shade avro when necessary.

        Show
        Philip Zeyliger added a comment - Instead of doing string substitution, the patch should probably only substitute the namespace field specifically. If someone had the classname in a comment, it shouldn't be replaced, for example. Otherwise, I'm a big fan of being able to shade avro when necessary.
        Hide
        Doug Cutting added a comment -

        > the patch should probably only substitute the namespace field specifically

        That's a flaw, but probably not the most critical one. Avro identifiers cannot contain dots except in namespaces. So as long as the package name contains a dot (which most do) then the global replace should not harm the schema. Fixing this requires a fair amount of code, walking the schema and creating a copy with things renamed. (We already do this in a few places, so probably we should create a SchemaVisitor API to simplify this, but that's a separate issue.)

        Note that this approach will always be flawed, since it won't always be able to perfectly reconstruct the relocations used when shading. However replacement is only attempted when things are already broken, so it does no harm and imperfections are thus tolerable.

        Probably the biggest flaw of the current patch is that it will fails if nested schemas are not all in the same namespace. To address this we might look for a common suffix or prefix in the new and old package and then replace the differing text. For example, if the current class is com.baz.hidden.org.foo.Bar and the schema is org.foo.Bar then the replacement should be to prefix all namespaces with com.baz.hidden.

        I'd be happy to see such improvements to this patch, but I'd not object to it being committed more-or-less as-is since it does no harm.

        Show
        Doug Cutting added a comment - > the patch should probably only substitute the namespace field specifically That's a flaw, but probably not the most critical one. Avro identifiers cannot contain dots except in namespaces. So as long as the package name contains a dot (which most do) then the global replace should not harm the schema. Fixing this requires a fair amount of code, walking the schema and creating a copy with things renamed. (We already do this in a few places, so probably we should create a SchemaVisitor API to simplify this, but that's a separate issue.) Note that this approach will always be flawed, since it won't always be able to perfectly reconstruct the relocations used when shading. However replacement is only attempted when things are already broken, so it does no harm and imperfections are thus tolerable. Probably the biggest flaw of the current patch is that it will fails if nested schemas are not all in the same namespace. To address this we might look for a common suffix or prefix in the new and old package and then replace the differing text. For example, if the current class is com.baz.hidden.org.foo.Bar and the schema is org.foo.Bar then the replacement should be to prefix all namespaces with com.baz.hidden. I'd be happy to see such improvements to this patch, but I'd not object to it being committed more-or-less as-is since it does no harm.
        Hide
        Doug Cutting added a comment -

        I'll commit this soon unless someone objects. It's not perfect but it's better than nothing.

        Show
        Doug Cutting added a comment - I'll commit this soon unless someone objects. It's not perfect but it's better than nothing.
        Hide
        Doug Cutting added a comment -

        I committed this.

        Show
        Doug Cutting added a comment - I committed this.

          People

          • Assignee:
            Doug Cutting
            Reporter:
            Matt Massie
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development