Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1407

Dictionaries can only hold a maximum of 4096 indices

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.6.0
    • 0.7.0
    • Java
    • None

    Description

      Dictionaries seem to only be able to hold 4096 indices, meaning only vectors with 4096 values or less can be turned into dictionaries. The image attached is a stack trace of what happens when try to encode a dictionary with a vector containing 4097 strings, and a dictionary containing two distinct values.

      Basically the error can be traced to line 95 of DictionaryEncoder.java (`setter.invoke(mutator, i, encoded);`). It seems that the indices array which hold the encoded values is allocated on line 84 as `indices.allocateNew()` and it seems that `allocateNew()` only allocates 4096 bytes of data initially. The code runs if there are 4096 rows of data or less. Anymore and the same error is given.

      Attachments

        1. Screen Shot 2017-08-22 at 7.14.07 PM.png
          150 kB
          Shayan Monshizadeh

        Activity

          People

            icexelloss Li Jin
            shayanm Shayan Monshizadeh
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: