Several compute kernels create the resulting ArrayData with the same offset of one of the operands. Instead this offset should be 0 since the buffer is freshly constructed with the correct len.
Example of one failing test:
Additionally, the boolean kernels seem to require that both operands have the same offset. This shouldn't be needed, but it seems that the simd implementation requires that the offset is a multiple of 8 (bits) so that the operation works correctly on whole bytes. The scalar implementation should be fine with any offset.