Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.15.1, 0.16.0
-
Docker
Description
Skimming through Validate() code in both 0.15 and master, I noticed an oversight in BinaryArray validation in C++ (and Python).
ValidateOffsets() checks that the first offset is 0, but it doesn't check that the offsets all point within the data buffer. A nefarious Arrow file could write offsets=[0,999999] and data=[]. If a caller reads the first value in that array, that will produce a buffer over-read.
Validation is cheap, since Arrow already validates that offsets are monotonically increasing. One need only test that the last offset is less than or equal to the size of the data buffer.
We at Workbench are letting untrusted programs write Arrow files that we then validate and read. We're keen to ensure Arrow files don't allow untrusted programs to plant data that leads to arbitrary code execution or arbitrary reads. We wrote a validation tool that checks this buffer over-read I describe here: https://github.com/CJWorkbench/arrow-tools/blob/005fe582b428c1ab6a9ed5f6dc968387d77e9a80/src/arrow-validate.cc#L27. But it feels to me like Arrow's Validate() should be checking this.
Attachments
Issue Links
- links to