Details

    • Type: New Feature New Feature
    • Status: Patch Available
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: java
    • Labels:
      None
    • Release Note:
      Implemented Schema#isSuperset, Schema#unify, and a Tool for generating unified Schemas from avro data files.

      Description

      From my post on the mailing list, and Doug's response:

      On 05/05/2011 10:29 AM, Joe Crobak wrote:
      > We've recently come across a situation where we have two data files with
      > different schemas that we'd like to process together using
      > GenericDatumReader. One schema is promotable to the other, but not vice
      > versa. We'd like to programmatically determine which of the schemas to
      > use. I did a brief look through javadoc and tests, and I couldn't find
      > any examples of checking if one schema is promotable to the other. Has
      > anyone else come across this?
      >
      > For some context, we're considering patching AvroStorage [1] to remove
      > the assumption that all files have the same schema. In our case, our
      > schema has evolved in that a field that was an int was promoted to a long.

      A boolean method that tells you if one schema is promotable to another
      would work in this case, but would not help in cases where, e.g.,
      different fields had changed in different versions. For example, in
      branched development, two branches might each add a distinct symbol to
      an enum. So I think you might be better off with a method that, given
      two schemas, returns their superset, a schema that can read data written
      by either.

      Such a method does not yet exist in Avro, but should not be difficult to
      add. Please file an issue in Jira if this sounds of interest.

      Doug

      I think it would be useful to have both of the methods that Doug mentioned in some sort of schema utils class.

      1. AVRO-816.patch
        10 kB
        Joe Crobak
      2. AVRO-816.patch
        28 kB
        Joe Crobak
      3. AVRO-816.patch
        30 kB
        Joe Crobak
      4. AVRO-816.patch
        33 kB
        Joe Crobak

        Activity

        Hide
        Doug Cutting added a comment -

        The patch still applies but tests no longer pass.

        I think both parts are useful and would be willing to commit this. Re-reading the comments, I think Scott's concern is that we (someday) also provide something like "intersection" rather than opposition to "union".

        I'd perhaps like to see the documentation improved. "subsume" and "unify" should be more clearly defined.

        Show
        Doug Cutting added a comment - The patch still applies but tests no longer pass. I think both parts are useful and would be willing to commit this. Re-reading the comments, I think Scott's concern is that we (someday) also provide something like "intersection" rather than opposition to "union". I'd perhaps like to see the documentation improved. "subsume" and "unify" should be more clearly defined.
        Hide
        Bob Cotton added a comment -

        There are two issues covered by this patch, subsume-ification and unification, with the latter being the contentious one.

        We have an interest in using subsume, can we split this JIRA and get subsume committed?
        Would be nice not to have to track our custom build for this.

        Show
        Bob Cotton added a comment - There are two issues covered by this patch, subsume-ification and unification, with the latter being the contentious one. We have an interest in using subsume, can we split this JIRA and get subsume committed? Would be nice not to have to track our custom build for this.
        Hide
        Scott Carey added a comment -

        Sorry for the (far too long) delay.

        In the case of unions, maximum fidelity is simple, but how would you propose "unifying" a type: "long" and type: "int" field that aren't in a union? Are you proposing that the composite type would be the union of "long" and "int"?

        Yes, I am suggesting that an operation that creates a union of two schemas would result in all possible branches existing and not just a compatible subset. A separate operation would take care of the latter, but I'm not sure what a good name for that would be.

        The use case for each of these differs. In one case the need is for an exact superset schema in order to read all variations with maximum information retention – for example this may be useful when reading over a large set of archived data with a long schema evolution history.

        In another situation, reading all of the data with possible loss of precision is ok, and instead the goal is to generate the simplest schema that can read all of the data within an accepted level of precision loss.

        My contention is not that we must create both of these in this ticket, but that if we only have one of them we leave space in the API conceptually for the other (in documentation and class/method names).

        These rules are explicitly about reading data. Perhaps we could name them 'resolves' and 'resolve'?

        Perhaps in another ticket we can talk about how to change the language in the spec to be about about the operations we apply to the schema rather than what we are using those operations for. I think at this point that is relatively low priority until we start adding other features besides reading that rely on schema resolution, and I don't want to tie that discussion to this ticket.

        So do you think these methods don't belong on Schema, but rather on some utility class in the io package

        It makes sense to keep pure schema operations in the Schema class or with other Schema utilities. If I were to start over from scratch I'd probably have an org.apache.avro.schema package that had no dependencies on other concepts in Avro such as serialization or object representations. But that is separate from this ticket. Lets just be careful with mixing schema manipulation/transformation concepts unnecessarily with object representation or serialization concepts.

        Show
        Scott Carey added a comment - Sorry for the (far too long) delay. In the case of unions, maximum fidelity is simple, but how would you propose "unifying" a type: "long" and type: "int" field that aren't in a union? Are you proposing that the composite type would be the union of "long" and "int"? Yes, I am suggesting that an operation that creates a union of two schemas would result in all possible branches existing and not just a compatible subset. A separate operation would take care of the latter, but I'm not sure what a good name for that would be. The use case for each of these differs. In one case the need is for an exact superset schema in order to read all variations with maximum information retention – for example this may be useful when reading over a large set of archived data with a long schema evolution history. In another situation, reading all of the data with possible loss of precision is ok, and instead the goal is to generate the simplest schema that can read all of the data within an accepted level of precision loss. My contention is not that we must create both of these in this ticket, but that if we only have one of them we leave space in the API conceptually for the other (in documentation and class/method names). These rules are explicitly about reading data. Perhaps we could name them 'resolves' and 'resolve'? Perhaps in another ticket we can talk about how to change the language in the spec to be about about the operations we apply to the schema rather than what we are using those operations for. I think at this point that is relatively low priority until we start adding other features besides reading that rely on schema resolution, and I don't want to tie that discussion to this ticket. So do you think these methods don't belong on Schema, but rather on some utility class in the io package It makes sense to keep pure schema operations in the Schema class or with other Schema utilities. If I were to start over from scratch I'd probably have an org.apache.avro.schema package that had no dependencies on other concepts in Avro such as serialization or object representations. But that is separate from this ticket. Lets just be careful with mixing schema manipulation/transformation concepts unnecessarily with object representation or serialization concepts.
        Hide
        Doug Cutting added a comment -

        Where are we with this issue? Scott, what changes would you like to see in this patch?

        Show
        Doug Cutting added a comment - Where are we with this issue? Scott, what changes would you like to see in this patch?
        Hide
        Doug Cutting added a comment -

        > Lets not introduce the notion of reading, writing, or serializing at all in Schema.java.

        So do you think these methods don't belong on Schema, but rather on some utility class in the io package

        They're derived from the specification's schema resolution rules:

        http://avro.apache.org/docs/current/spec.html#Schema+Resolution

        These rules are explicitly about reading data. Perhaps we could name them 'resolves' and 'resolve'?

        Show
        Doug Cutting added a comment - > Lets not introduce the notion of reading, writing, or serializing at all in Schema.java. So do you think these methods don't belong on Schema, but rather on some utility class in the io package They're derived from the specification's schema resolution rules: http://avro.apache.org/docs/current/spec.html#Schema+Resolution These rules are explicitly about reading data. Perhaps we could name them 'resolves' and 'resolve'?
        Hide
        Douglas Kaminsky added a comment -

        You're probably right – just I haven't come across the use case yet. How do you feel about creating a separate issue for adding the other operations? I think that some of the operations, in particular diff/intersection, will require some thinking and documentation (and would we want to prove that the operations are true set operations and thus commutative, associative, etc?)

        If we can agree to the above, then we only need to agree on naming of the two methods implemented here.

        I wouldn't be against a separate issue for this - I think if implemented as described they are naturally commutative and associative so if you need a proof it should follow easily.

        I would especially look forward to a way to programatically "diff" schemas using these new operations and allow custom processing of the output (maybe some sort of visitor pattern or event dispatch). This is a real use case that we are currently doing very inefficiently using the ResolvingGrammarGenerator

        Show
        Douglas Kaminsky added a comment - You're probably right – just I haven't come across the use case yet. How do you feel about creating a separate issue for adding the other operations? I think that some of the operations, in particular diff/intersection, will require some thinking and documentation (and would we want to prove that the operations are true set operations and thus commutative, associative, etc?) If we can agree to the above, then we only need to agree on naming of the two methods implemented here. I wouldn't be against a separate issue for this - I think if implemented as described they are naturally commutative and associative so if you need a proof it should follow easily. I would especially look forward to a way to programatically "diff" schemas using these new operations and allow custom processing of the output (maybe some sort of visitor pattern or event dispatch). This is a real use case that we are currently doing very inefficiently using the ResolvingGrammarGenerator
        Hide
        Douglas Kaminsky added a comment -

        In the case of unions, maximum fidelity is simple, but how would you propose "unifying" a type: "long" and type: "int" field that aren't in a union? Are you proposing that the composite type would be the union of "long" and "int"?

        Show
        Douglas Kaminsky added a comment - In the case of unions, maximum fidelity is simple, but how would you propose "unifying" a type: "long" and type: "int" field that aren't in a union? Are you proposing that the composite type would be the union of "long" and "int"?
        Hide
        Scott Carey added a comment -

        I agree that since "long" and "int" unify to "long" that ["null", "long"] and ["null", "int"] should ideally unify to ["null", "long"].

        I think this is a different operation than 'unify'. It creates a schema that can resolve both source schemas, but information is lost. If we say that the result is ["null", "long"] we are in effect saying that the two schemas: ["null", "long"] and ["null", "long", "int"] are the same. We might as well not allow the schema ["long", "int", "null"] and force only one numeric type in a union.

        If there is use for a schema ["long", "int", "null"] (and I argue there is), then the union of ["null", "long"] and ["null", "int"] should be ["long", "int", "null"].

        The notion of making a union of two schemas and resolving them to a schema that can read both is distinct.

        Type promotion from int -> long does not lose data, but promotion from int -> float does. The result is an approximation with (much) fewer significant bits.

        For example

              System.out.println(String.format("%10d", Integer.MAX_VALUE));
              System.out.println(String.format("%10.1f", (float)Integer.MAX_VALUE));
              System.out.println(String.format("%10d", Integer.MAX_VALUE - 63));
              System.out.println(String.format("%10.1f", (float)(Integer.MAX_VALUE - 63)));
        

        prints:

        2147483647
        2147483648.0
        2147483584
        2147483648.0
        

        So on one hand we want to ask for a schema that "can read both of these", and on the other we have a schema that "can read both at maximum fidelity".

        Show
        Scott Carey added a comment - I agree that since "long" and "int" unify to "long" that ["null", "long"] and ["null", "int"] should ideally unify to ["null", "long"] . I think this is a different operation than 'unify'. It creates a schema that can resolve both source schemas, but information is lost. If we say that the result is ["null", "long"] we are in effect saying that the two schemas: ["null", "long"] and ["null", "long", "int"] are the same. We might as well not allow the schema ["long", "int", "null"] and force only one numeric type in a union. If there is use for a schema ["long", "int", "null"] (and I argue there is), then the union of ["null", "long"] and ["null", "int"] should be ["long", "int", "null"] . The notion of making a union of two schemas and resolving them to a schema that can read both is distinct. Type promotion from int -> long does not lose data, but promotion from int -> float does. The result is an approximation with (much) fewer significant bits. For example System .out.println( String .format( "%10d" , Integer .MAX_VALUE)); System .out.println( String .format( "%10.1f" , ( float ) Integer .MAX_VALUE)); System .out.println( String .format( "%10d" , Integer .MAX_VALUE - 63)); System .out.println( String .format( "%10.1f" , ( float )( Integer .MAX_VALUE - 63))); prints: 2147483647 2147483648.0 2147483584 2147483648.0 So on one hand we want to ask for a schema that "can read both of these", and on the other we have a schema that "can read both at maximum fidelity".
        Hide
        Scott Carey added a comment -

        Maybe 'canRead/readsBoth'?

        Lets not introduce the notion of reading, writing, or serializing at all in Schema.java. 'subsumes' may not be the best term, but whatever term is used should not relate to things outside of the scope of unions.

        As part of my conceptual work on AVRO-859 I realized that schema resolution/promotion does not only apply to reads. A resolution applies from a schema 'source' to a schema 'target'. The source could be an object graph, with the target another object graph or the source could be bytes, and the target bytes. One could have an object graph representing one schema and project out only a subset schema on write.

        Show
        Scott Carey added a comment - Maybe 'canRead/readsBoth'? Lets not introduce the notion of reading, writing, or serializing at all in Schema.java. 'subsumes' may not be the best term, but whatever term is used should not relate to things outside of the scope of unions. As part of my conceptual work on AVRO-859 I realized that schema resolution/promotion does not only apply to reads. A resolution applies from a schema 'source' to a schema 'target'. The source could be an object graph, with the target another object graph or the source could be bytes, and the target bytes. One could have an object graph representing one schema and project out only a subset schema on write.
        Hide
        Joe Crobak added a comment -

        I think the set operations make sense conceptually for several use cases - custom diff reports on schemas, re-creating inheritance at runtime, deep copy between disparate types, etc.

        You're probably right – just I haven't come across the use case yet. How do you feel about creating a separate issue for adding the other operations? I think that some of the operations, in particular diff/intersection, will require some thinking and documentation (and would we want to prove that the operations are true set operations and thus commutative, associative, etc?)

        If we can agree to the above, then we only need to agree on naming of the two methods implemented here.

        Show
        Joe Crobak added a comment - I think the set operations make sense conceptually for several use cases - custom diff reports on schemas, re-creating inheritance at runtime, deep copy between disparate types, etc. You're probably right – just I haven't come across the use case yet. How do you feel about creating a separate issue for adding the other operations? I think that some of the operations, in particular diff/intersection, will require some thinking and documentation (and would we want to prove that the operations are true set operations and thus commutative, associative, etc?) If we can agree to the above, then we only need to agree on naming of the two methods implemented here.
        Hide
        Douglas Kaminsky added a comment -

        **sorry, meant 32 bits, also realized I arbitrarily was using a 32 bit word size system in my example, I'm sure you get where I'm going with it

        Show
        Douglas Kaminsky added a comment - **sorry, meant 32 bits, also realized I arbitrarily was using a 32 bit word size system in my example, I'm sure you get where I'm going with it
        Hide
        Douglas Kaminsky added a comment -

        I think the set operations make sense conceptually for several use cases - custom diff reports on schemas, re-creating inheritance at runtime, deep copy between disparate types, etc.

        To address the specific question, if the "unification" of a long and an int is a long, the intersection should logically be an int. It makes sense if you think of it in terms of bytes - the upper half of the long doesn't exist in the int, so the natural intersection are the lower 32 bytes shared by both types.

        Show
        Douglas Kaminsky added a comment - I think the set operations make sense conceptually for several use cases - custom diff reports on schemas, re-creating inheritance at runtime, deep copy between disparate types, etc. To address the specific question, if the "unification" of a long and an int is a long, the intersection should logically be an int. It makes sense if you think of it in terms of bytes - the upper half of the long doesn't exist in the int, so the natural intersection are the lower 32 bytes shared by both types.
        Hide
        Doug Cutting added a comment -

        I like subsumes/unify, but am open to other names. As Joe noted, sets may not be a great metaphor, since 'union' would be overloaded and not all set operations make sense. Maybe 'canRead/readsBoth'? Then we might add 'boolan readsSame(Schema)' for AVRO-853?

        Show
        Doug Cutting added a comment - I like subsumes/unify, but am open to other names. As Joe noted, sets may not be a great metaphor, since 'union' would be overloaded and not all set operations make sense. Maybe 'canRead/readsBoth'? Then we might add 'boolan readsSame(Schema)' for AVRO-853 ?
        Hide
        Joe Crobak added a comment -

        FWIW, I choose "unify" because this is solving a similar problem to the type inference problem, which is solved via unification... nothing to do with set operations.

        Show
        Joe Crobak added a comment - FWIW, I choose "unify" because this is solving a similar problem to the type inference problem, which is solved via unification ... nothing to do with set operations.
        Hide
        Joe Crobak added a comment -

        a) some thoughts on functionality - if you're going for set operations, you should just go for it and give the full set of operations:

        I think it's pretty non-intuitive (and potentially incorrect) to think about set operations for avro schemas. In addition to "union" meaning something different in Avro, it's unclear to me what a number of other operations would be... e.g. what is the intersection of int and long? int or {} (what does empty set even mean..)? There are plenty of other examples where it breaks down...

        With that said, I could see use for some of the methods you mention. I'd prefer to add those as part of a separate patch, though (this is already getting rather large). subsumes/unify grew out of the need to determine the correct "read" schema for reading multiple avro data files with different but compatible schemas... nothing to do with set operations.

        b) some thoughts on naming -

        I would favor "compose" "join" or "combine" over "unify"
        "subsumes" seems a bit... obtuse? I'd leave it as "isSupersetOf" or "contains"

        I'm terrible at naming .. compose, join, and combine would all be fine with me. I'd like to avoid the use of set operations in names per above (which rules out isSupersetOf) and I think contains would be confusing (e.g. a record with an int field "contains" an int schema)... other suggestions?

        Show
        Joe Crobak added a comment - a) some thoughts on functionality - if you're going for set operations, you should just go for it and give the full set of operations: I think it's pretty non-intuitive (and potentially incorrect) to think about set operations for avro schemas. In addition to "union" meaning something different in Avro, it's unclear to me what a number of other operations would be... e.g. what is the intersection of int and long? int or {} (what does empty set even mean..)? There are plenty of other examples where it breaks down... With that said, I could see use for some of the methods you mention. I'd prefer to add those as part of a separate patch, though (this is already getting rather large). subsumes/unify grew out of the need to determine the correct "read" schema for reading multiple avro data files with different but compatible schemas... nothing to do with set operations. b) some thoughts on naming - I would favor "compose" "join" or "combine" over "unify" "subsumes" seems a bit... obtuse? I'd leave it as "isSupersetOf" or "contains" I'm terrible at naming .. compose, join, and combine would all be fine with me. I'd like to avoid the use of set operations in names per above (which rules out isSupersetOf) and I think contains would be confusing (e.g. a record with an int field "contains" an int schema)... other suggestions?
        Hide
        Douglas Kaminsky added a comment -

        a) some thoughts on functionality - if you're going for set operations, you should just go for it and give the full set of operations:

        • intersection of two schemas would be conceivably useful
        • in addition to isSuperset, isSubset should be available
        • isStrictSubset or isProperSubset (even if its entire body is "return !isSuperset(schema)")
        • isStrictSuperset or isProperSuperset (even if its entire body is "return !isSubset(schema)")

        b) some thoughts on naming -

        • I would favor "compose" "join" or "combine" over "unify"
        • "subsumes" seems a bit... obtuse? I'd leave it as "isSupersetOf" or "contains"

        c) +1 on the ability to combine two records whose schemas compose a more complex type

        Show
        Douglas Kaminsky added a comment - a) some thoughts on functionality - if you're going for set operations, you should just go for it and give the full set of operations: intersection of two schemas would be conceivably useful in addition to isSuperset, isSubset should be available isStrictSubset or isProperSubset (even if its entire body is "return !isSuperset(schema)") isStrictSuperset or isProperSuperset (even if its entire body is "return !isSubset(schema)") b) some thoughts on naming - I would favor "compose" "join" or "combine" over "unify" "subsumes" seems a bit... obtuse? I'd leave it as "isSupersetOf" or "contains" c) +1 on the ability to combine two records whose schemas compose a more complex type
        Hide
        Joe Crobak added a comment -

        I've updated the patch to fix the problem identified with the FixedSchema, and I've fixed subsumes and unify for UnionSchema in the case where one invokes subsume/unify with ["null", "int"] and ["null", "long"] (as well as more complex types).

        The implementation is quadratic (loops over both unions), and could possibly be improved if there are performance concerns.

        Show
        Joe Crobak added a comment - I've updated the patch to fix the problem identified with the FixedSchema, and I've fixed subsumes and unify for UnionSchema in the case where one invokes subsume/unify with ["null", "int"] and ["null", "long"] (as well as more complex types). The implementation is quadratic (loops over both unions), and could possibly be improved if there are performance concerns.
        Hide
        Doug Cutting added a comment -

        I agree that since "long" and "int" unify to "long" that ["null", "long"] and ["null", "int"] should ideally unify to ["null", "long"].

        Show
        Doug Cutting added a comment - I agree that since "long" and "int" unify to "long" that ["null", "long"] and ["null", "int"] should ideally unify to ["null", "long"] .
        Hide
        Joe Crobak added a comment -

        I've discovered one other case that might be considered a bug. currently, ["null", "int"] and ["null", "long"] unify to ["null", "int", "long"]. I think it'd be much nicer if we followed the primitive promotion rules and unified to ["null", "long"], though. What do others think?

        Show
        Joe Crobak added a comment - I've discovered one other case that might be considered a bug. currently, ["null", "int"] and ["null", "long"] unify to ["null", "int", "long"] . I think it'd be much nicer if we followed the primitive promotion rules and unified to ["null", "long"] , though. What do others think?
        Hide
        Joe Crobak added a comment -
        return getFixedSize() >= other.getFixedSize();
        

        Great catch - thanks Scott! I'll fix this and also take another pass on the spec to make sure that the implementation is consistent with it.

        Show
        Joe Crobak added a comment - return getFixedSize() >= other.getFixedSize(); Great catch - thanks Scott! I'll fix this and also take another pass on the spec to make sure that the implementation is consistent with it.
        Hide
        Scott Carey added a comment -
        return getFixedSize() >= other.getFixedSize();
        

        I believe the below is incorrect for a fixed schema subsuming another. You cannot type promote a fixed schema from one length to a longer one. The spec states that the sizes must match.

        Show
        Scott Carey added a comment - return getFixedSize() >= other.getFixedSize(); I believe the below is incorrect for a fixed schema subsuming another. You cannot type promote a fixed schema from one length to a longer one. The spec states that the sizes must match.
        Hide
        Joe Crobak added a comment -

        Updated patch that incorporates Doug's feedback.

        Show
        Joe Crobak added a comment - Updated patch that incorporates Doug's feedback.
        Hide
        Joe Crobak added a comment -

        Agree on all accounts – I'll update the patch with your feedback

        Show
        Joe Crobak added a comment - Agree on all accounts – I'll update the patch with your feedback
        Hide
        Doug Cutting added a comment -

        Patch looks great. A few things we might consider changing before committing:

        • The relationship between 'isSuperset' and 'unify' might be a bit clearer if different names were used. 'isSuperset/union' would be confusing since 'union' already has a meaning in Avro, so maybe 'subsumes/unify' would be slightly better? What do you think?
        • Unification currently ignores schema & record properties and field ordering. This is fine, since they're not needed for reading, but probably should be called out in the javadoc. Someone could add this as a feature later if they need it.
        • When default values differ maybe an exception should be thrown?
        • Should a call to unifyAliases() be added to RecordSchema?
        • I would prefer that the methods instead started like:
            if (type != RECORD)
              return super.unify(other);
            ... 

          This removes a level of indentation from the bulk of the method and makes it clear to the reader that no other cases are dealt with.

        Show
        Doug Cutting added a comment - Patch looks great. A few things we might consider changing before committing: The relationship between 'isSuperset' and 'unify' might be a bit clearer if different names were used. 'isSuperset/union' would be confusing since 'union' already has a meaning in Avro, so maybe 'subsumes/unify' would be slightly better? What do you think? Unification currently ignores schema & record properties and field ordering. This is fine, since they're not needed for reading, but probably should be called out in the javadoc. Someone could add this as a feature later if they need it. When default values differ maybe an exception should be thrown? Should a call to unifyAliases() be added to RecordSchema? I would prefer that the methods instead started like: if (type != RECORD) return super .unify(other); ... This removes a level of indentation from the bulk of the method and makes it clear to the reader that no other cases are dealt with.
        Hide
        Joe Crobak added a comment -

        Implemented Schema#isSuperset, Schema#unify, and a Tool for generating unified Schemas from avro data files.

        I have one open question about how unify should work in the case where two Records have the same name/type but different default values. I suspect there are other edge cases that I may have missed – maybe someone more familiar with the data model can suggest some additional checks.

        Show
        Joe Crobak added a comment - Implemented Schema#isSuperset, Schema#unify, and a Tool for generating unified Schemas from avro data files. I have one open question about how unify should work in the case where two Records have the same name/type but different default values. I suspect there are other edge cases that I may have missed – maybe someone more familiar with the data model can suggest some additional checks.
        Hide
        Doug Cutting added a comment -

        I don't mind having it on Schema directly, like applyAliases.

        Show
        Doug Cutting added a comment - I don't mind having it on Schema directly, like applyAliases.
        Hide
        Joe Crobak added a comment -

        Thanks for the feedback. I think moving the isSuperset method to Schema would be a lot nicer. What do you think about unify, would a method like Schema#unify(Schema... others) be ok or should it stay in a static util class?

        Will also remove the use Arrays#asList/contains.

        Show
        Joe Crobak added a comment - Thanks for the feedback. I think moving the isSuperset method to Schema would be a lot nicer. What do you think about unify, would a method like Schema#unify(Schema... others) be ok or should it stay in a static util class? Will also remove the use Arrays#asList/contains.
        Hide
        Doug Cutting added a comment -

        I don't think this duplicates too much logic.

        Alternately, the isSuperset method could be Schema#isSuperset(Schema that). Then, instead of a big switch statement, you'd implement this method on Schema subclasses. The base class, Schema, would implement this as just this.getType()==that.getType, handling int, byte, boolean and string. NamedSchema would override this, checking names, etc.

        Also, I worry that 'Arrays.asList(Type.INT, Type.LONG).contains(t)' may be slow, if performance matters. 't==INT || t == LONG' would be a lot faster and still readable.

        Show
        Doug Cutting added a comment - I don't think this duplicates too much logic. Alternately, the isSuperset method could be Schema#isSuperset(Schema that). Then, instead of a big switch statement, you'd implement this method on Schema subclasses. The base class, Schema, would implement this as just this.getType()==that.getType, handling int, byte, boolean and string. NamedSchema would override this, checking names, etc. Also, I worry that 'Arrays.asList(Type.INT, Type.LONG).contains(t)' may be slow, if performance matters. 't==INT || t == LONG' would be a lot faster and still readable.
        Hide
        Joe Crobak added a comment -

        I was looking for some feedback on the approach I took for isSuperset before working on unify. My implementation is duplicating a lot of the logic in ResolvingGrammarGenerator, but it's generate method doesn't seem quite suitable. Please let me know if you have suggestions on how the logic could be consolidated.

        Show
        Joe Crobak added a comment - I was looking for some feedback on the approach I took for isSuperset before working on unify . My implementation is duplicating a lot of the logic in ResolvingGrammarGenerator, but it's generate method doesn't seem quite suitable. Please let me know if you have suggestions on how the logic could be consolidated.
        Hide
        Joe Crobak added a comment -

        I think I should have some time to implement this in the coming days, so I'll try to put a patch together.

        Show
        Joe Crobak added a comment - I think I should have some time to implement this in the coming days, so I'll try to put a patch together.

          People

          • Assignee:
            Joe Crobak
            Reporter:
            Joe Crobak
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development