[BEAM-10277] beam:coder:row:v1 implementations should respect encoding_position - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: P3
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: sdk-java-core, sdk-py-core
Labels:
- Clarified

Description

Problem/Status

The schema proto has an encoding_position field that is currently unused in every row coder implementation. The intention of this field is that it indicates an alternative order for the fields to be encoded in by beam:coder:row:v1 implementations. Currently all the implementations ignore this field, and always encode the fields in the order that they appear in the schema.

Motivation

The idea with the encoding position is that it will give runners a way to enforce schema compatibility (~~BEAM-9502~~), by re-ordering the way fields are encoded when the schema changes between two job submissions. Schema changes could be due to fields re-ordering, or field additions/deletions.

Code pointers

The Python beam:coder:row:v1 implementation lives in row_coder.py
The Java implementation is a little more complicated, distributed between SchemaCoder, RowCoder, and RowCoderGenerator. RowCoderGenerator contains the code relevant to this jira - it uses bytebuddy to generate Java code for the coder. We need it to generate code that puts fields in the order specified by encoding_position.

Testing

Python and Java implementations should be tested with unit tests (RowCoderTest, row_coder_test). We should also test them for compatibility by adding test cases that exercise the encoding_position in standard_coders.yaml. These tests will be executed by CommonCoderTest and standard_coders_test. There's some example code for generating a new test case here.

Attachments

Issue Links

is related to

BEAM-12198 Support Dataflow update when schemas are used

Triage Needed

BEAM-9502 SchemaCoder is not update compatible

Resolved

relates to

BEAM-13043 Implementation encoding position sdk-go

Open

links to

GitHub Pull Request #14591

GitHub Pull Request #14639

GitHub Pull Request #15410

GitHub Pull Request #16267

(2 links to)

Activity

People

Assignee:: Unassigned

Reporter:: Brian Hulette

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 18/Jun/20 18:15

Updated:: 04/Jun/22 18:02

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

31h 50m