XML

Word

Printable

JSON

Details

Type: New Feature
Status: To Do
Priority: Minor
Resolution: Unresolved
Component/s: Apache MXNet Backend
Labels:
None

Description

IndexArray is an operator that returns an array of indexes of the input array.

For an input array with shape (d_1, d_2, ..., d_n), index_array returns a (d_1, d_2, ..., d_n, n) array idx, where idx[i_1, i_2, ..., i_n, :] = [i_1, i_2, ..., i_n].

Additionally, when the parameter axes is specified, idx will be a
(d_1, d_2, ..., d_n, m) array where m is the length of axes, and the following
equality will hold: idx[i_1, i_2, ..., i_n, j] = i_{axes[j]}.

Motivation

This operator can be used to generate meshgrids for tensors without knowing their exact shapes during construction. For instance, this operator can be used to make a makeshift prior box generator for anchor-based computer vision models:

feature_map = F.ones((8, 128, 128, 256)) # N x H x W x C, no shape information when using the Symbol API.
prior_box_stride = 16
box_size=[8, 8]

template = F.squeeze(F.slice_axis(feature_map, begin=0, end=1, axis=-1), axis=-1) # N x H x W
box_centres = F.contrib.index_array(template, axes=(-2, -1, -2, -1)).astype("float32") # N x H x W x 4
box_centres = F.broadcast_mul(box_centres, F.array([prior_box_stride]).reshape((1, 1, 1, 1))) # N x H x W x 4
corner_offsets = F.array(box_size).reshape((1, 1, 1, 2))
corner_offsets = F.concat(-corner_offsets/2, corner_offsets/2, dim=-1)
box_corners = F.broadcast_plus(box_centres, corner_offsets)

Also, this operator can be applied to implement positional encodings for sequence processing, e.g.:

sequence_embeddings = F.ones((65, 8, 256)) # T x N x C, no shape information when using the Symbol API.
template = sequence_embeddings.reshape((0, 0, -1, 2)) # T x N x C -> T x N x (C/2) x 2
pos, i = F.split(
    F.contrib.index_array(template, axes=(0, 2)).astype("float32"), # T x N x (C/2) x 2 x 2
    axis=-1,
    num_outputs=2,
    squeeze_axis=True
) # T x N x (C/2) x 2 and T x N x (C/2) x 2
base = F.ones((1, 1, 1, 1)) * 10000
dmodel = F.slice_axis(F.shape_array(sequence_embeddings), begin=-1, end=None, axis=0)
dmodel = dmodel.reshape((1, 1, 1, 1)).astype("float32")
tmp = F.broadcast_div(pos, F.broadcast_power(base, F.broadcast_div(2 * i,  dmodel))) # T x N x (C/2) x 2
sin_input, cos_input = F.split(tmp, axis=-1, num_outputs=2, squeeze_axis=True) # T x N x (C/2) and T x N x (C/2)
positional_encoding = F.stack(F.sin(sin_input), F.cos(cos_input), axis=-1).reshape((0, 0, -3)) # T x N x C

Attachments

Issue Links

links to

GitHub Pull Request #14638

Activity

People

Assignee:: Unassigned

Reporter:: Nick Guletskii

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 07/Apr/19 15:51

Updated:: 25/May/19 20:54

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

5.5h