An ordered set is easily compared to an array, perhaps they should be ordered.
If things are unordered, comparison gets more complicated. For example, the simple way to compare would be to build both sets up in memory – this would not work for large sets or lists.
I propose that a set is ordered by default. A client can do this by either sorting when writing, or with a data structure like a linked set. A client can choose to disregard order and accept that set equivalence is invalid if they wish.
We could consider an "ordered": true property as well
"unique": false, "ordered": true
Would then be the implicit default. An array or set that is not ordered is always unequal to another array or set. A reader with an 'ordered' schema reading an 'unordered' serialization would be a tough spot. That might not be a supported promotion.
That leads us to the next question: What are the resolution rules between arrays and sets? My answer to this is: sets written can always be read as arrays. Arrays written can be read as sets as long as the uniqueness constraint is not violated.
One can always read an array as a set, even with duplicates. The duplicates get eliminated in the process of creating the set. Interestingly, one can go either direction, but not back and forth.
I think in the short run, Doug's version is appropriate. The above would be valuable but also take a while to sort out the details for what works best across languages and is a spec change, rather than a Java API extension. Besides, it should be possible to specify what object to use as an array container regardless.
Taken in combination with AVRO-436, ordered maps, these two things are potentially significant changes for something that clients can emulate on their own. Ordered maps can be emulated by an array of
The simplest option other than Doug's Java Reflect API version is to require that Avro sets are ordered for the purposes of equivalence and comparisson, and if a client wants to compare two objects for equality or sort order they must guarantee order on writing (this restriction already happens when serializing sets as arrays).
"unique": (true|false) is then just a reserved keyword hint for languages to construct Set - like APIs for data access.
We can consider adding the more difficult support for unordered sets/lists incrementally.