Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
0.6.0
-
None
Description
Dictionaries seem to only be able to hold 4096 indices, meaning only vectors with 4096 values or less can be turned into dictionaries. The image attached is a stack trace of what happens when try to encode a dictionary with a vector containing 4097 strings, and a dictionary containing two distinct values.
Basically the error can be traced to line 95 of DictionaryEncoder.java (`setter.invoke(mutator, i, encoded);`). It seems that the indices array which hold the encoded values is allocated on line 84 as `indices.allocateNew()` and it seems that `allocateNew()` only allocates 4096 bytes of data initially. The code runs if there are 4096 rows of data or less. Anymore and the same error is given.