Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
Description
The countBy function is not returning correct histograms, as it seems to select the wrong array type for the indexing.
The following line in countBy seems to be causing the problems:
const countByteLength = Math.ceil(Math.log(vector.dictionary.length) / Math.log(256));
For example, if the dictionary length is 3, yet the indices length is 1 million, the result of this expression will be 1, which will lead to a Uint8Array being used, again resulting in overflows.
Codepen example
https://codepen.io/Yngve92/pen/mYdWrr
If I switch the expression to: const countByteLength = Math.ceil(Math.log(vector.length) / Math.log(256)); it seems to be working all right, but I am not sure if this is correct.
The expression is on L63, L189 in src/compute/dataframe.ts.
PR submitted: https://github.com/apache/arrow/pull/4265
Attachments
Issue Links
- links to