Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
The current Tensor class stores data in a contiguous chunk of memory. It is a very simple implementation. However, it is difficult to support some operations. For example,
Tensor a = b.transpose() Tensor d = a + c
Like other tensor implementation (e.g. numpy), Tensor a and b shares memory. The addition operation has to do real transpose, which incurs some overhead. With stride, we can avoid the transpose operation. Instead, we enumerate each element of a and c using the index, shape and stride information. https://stackoverflow.com/questions/32034237/how-does-numpys-transpose-method-permute-the-axes-of-an-array
More over, stride is necessary for broadcasting operations. https://stackoverflow.com/questions/39626233/how-did-numpy-implement-multi-dimensional-broadcasting.
Code in src/core/tensor.cc tensor_math_cpp.h tensor_math_cuda.h needs modification when stride is added as a field/member of Tensor.