Details
-
New Feature
-
Status: To Do
-
Minor
-
Resolution: Unresolved
-
None
Description
Description
IndexArray is an operator that returns an array of indexes of the input array.
For an input array with shape (d_1, d_2, ..., d_n), index_array returns a (d_1, d_2, ..., d_n, n) array idx, where idx[i_1, i_2, ..., i_n, :] = [i_1, i_2, ..., i_n].
Additionally, when the parameter axes is specified, idx will be a
(d_1, d_2, ..., d_n, m) array where m is the length of axes, and the following
equality will hold: idx[i_1, i_2, ..., i_n, j] = i_{axes[j]}.
Motivation
This operator can be used to generate meshgrids for tensors without knowing their exact shapes during construction. For instance, this operator can be used to make a makeshift prior box generator for anchor-based computer vision models:
feature_map = F.ones((8, 128, 128, 256)) # N x H x W x C, no shape information when using the Symbol API.
prior_box_stride = 16
box_size=[8, 8]
template = F.squeeze(F.slice_axis(feature_map, begin=0, end=1, axis=-1), axis=-1) # N x H x W
box_centres = F.contrib.index_array(template, axes=(-2, -1, -2, -1)).astype("float32") # N x H x W x 4
box_centres = F.broadcast_mul(box_centres, F.array([prior_box_stride]).reshape((1, 1, 1, 1))) # N x H x W x 4
corner_offsets = F.array(box_size).reshape((1, 1, 1, 2))
corner_offsets = F.concat(-corner_offsets/2, corner_offsets/2, dim=-1)
box_corners = F.broadcast_plus(box_centres, corner_offsets)
Also, this operator can be applied to implement positional encodings for sequence processing, e.g.:
sequence_embeddings = F.ones((65, 8, 256)) # T x N x C, no shape information when using the Symbol API. template = sequence_embeddings.reshape((0, 0, -1, 2)) # T x N x C -> T x N x (C/2) x 2 pos, i = F.split( F.contrib.index_array(template, axes=(0, 2)).astype("float32"), # T x N x (C/2) x 2 x 2 axis=-1, num_outputs=2, squeeze_axis=True ) # T x N x (C/2) x 2 and T x N x (C/2) x 2 base = F.ones((1, 1, 1, 1)) * 10000 dmodel = F.slice_axis(F.shape_array(sequence_embeddings), begin=-1, end=None, axis=0) dmodel = dmodel.reshape((1, 1, 1, 1)).astype("float32") tmp = F.broadcast_div(pos, F.broadcast_power(base, F.broadcast_div(2 * i, dmodel))) # T x N x (C/2) x 2 sin_input, cos_input = F.split(tmp, axis=-1, num_outputs=2, squeeze_axis=True) # T x N x (C/2) and T x N x (C/2) positional_encoding = F.stack(F.sin(sin_input), F.cos(cos_input), axis=-1).reshape((0, 0, -3)) # T x N x C
Attachments
Issue Links
- links to