Details
-
Improvement
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
None
Description
Overarching issue: We need to get our Google Docs, markdown, and email threads that sketch the Beam model as it is developed into a centralized place with clear information architecture / navigation, and draw the line that "if it isn't reachable from here in an obvious way it isn't the spec". [1]
Specific issue: Which coders are required for a runner and SDK to understand? Which coders are otherwise considered standardized? What is the abstract specification for their wire format?
Today we have https://github.com/apache/beam/blob/master/model/fn-execution/src/test/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml which is the beginning of a compliance test suite for standardized coders.
This would really benefit from:
- narrative descriptions of the formats, including abstract specification (not examples) and perhaps motivation
- specification of which are required and which are merely "well known"
- ties into BEAM-3203 in terms of which coders are required to decode to compatible value in every SDK
- once we have an abstract spec and some examples, and one language has robust coders that pass the examples, we could turn it around and treat that implementation as a reference impl for fuzz testing
Any sort of fancy hacking that blends the tests with the narrative is fine, though mostly I think they'll end up covering disjoint topics.
[1] I filed BEAM-2567 and BEAM-2568 and ported https://beam.apache.org/contribute/runner-guide/, and herohde put together https://beam.apache.org/contribute/portability/ and https://github.com/apache/beam/blob/master/sdks/CONTAINERS.md