Description
rdblue reports sometimes he sees corrupt data on S3. Given MD5 checks from upload to S3, its likelier to have happened in VM RAM, HDD or nearby.
If the MD5 checksum for each block was built up as data was written to it, and checked against the etag RAM/HDD storage of the saved blocks could be removed as sources of corruption
The obvious place would be org.apache.hadoop.fs.s3a.S3ADataBlocks.DataBlock
Attachments
Issue Links
- is depended upon by
-
HADOOP-19080 S3A to support writing to object lock buckets
- Open
- relates to
-
BEAM-5196 Add MD5 consistency check on S3 uploads (writes)
- Resolved